Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Security and Privacy of Hash-BasedSoftware ApplicationsThis work has been partially supported by the LabEx PERSYVAL-Lab(ANR-11-LABX-0025-01) funded by the French program Investissement drsquoavenir
Amrit Kumar
October 20 2016
Privatics team InriaUniversiteacute Grenoble Alpes
Hashing
bull A function h 0 1lowast rarr 0 1` where ` is the digest size
bull Cryptographic (second) pre-image and collision resistant
h
h(x)
pre-imageresistance
h
x
h(x)
2nd pre-imageresistance
h
h(x prime)
6=
=
h
h(x)
collisionresistance
h
h(x prime)
6=
=
2` 2` 2`2Best genericattack
2
Are collisions always bad (I)
A simple use case
bull Instead of storing n (large) data items store their digestsbull If ` is large collisions are hard to find rArr space required = ntimes ` bits
Collisions for further space savings
x1
x2
x3
xn
d1
di
dm
Data Digests
bull di now substitutes both x1 and x2 rArr space required lt n times ` bitsbull Caveat May introduce some unexpected behaviorbull Core of several efficient (probabilistic) data structures
bull Bloom filters for membership testingbull Sketches for data stream analysis 3
Are collisions always bad (II)
Use case in privacy
bull Hashing as a pseudonymization technique
bull If ` is large but identifiers n is enumerable (in reasonable time)bull Exhaustive search breaks pseudonymization
d
d1
d2
` is large ` is small
h
h
h
h
bull If ` is sufficiently smallbull On average n2` identifiers share the same pseudonymbull Notion of anonymity-setbull Caveat Provides weak anonymity guaranteesbull Employed in Google Safe Browsing a malicious URL detection tool 4
Contrasting perspectives and outline
Contrasting perspectives
bull Collisions have to be absolutely avoided in cryptography
bull Somewhat welcome in algorithms and data structures
bull Useful to some extent in the context of privacy
Goal Investigate the security and privacy implications of hash collisions
Focus for today
bull Security Bloom Filters1 The Power of Evil Choices in Bloom Filters DSNrsquo15
Joint work with T Gerbet and C Lauradoux2 Bloom Filters in Adversarial Settings Under submission
Joint work with C Lauradoux and P Lafourcade
bull Privacy Safe Browsing1 A Privacy Analysis of Google and Yandex Safe Browsing DSNrsquo16
Joint work with T Gerbet and C Lauradoux
5
Security Bloom Filters
Bloom filters [Bloom 1970]
Setup(m n k)
bull A binary vector ~z of size m compressing a set of n itemsbull k uniform and independent hash functions hi 0 1lowast rarr [0m minus 1]bull ~z initialized to ~0
Operations
bull Insert(x) Set bits of ~z at h1(x) hk(x) to 1
0 0 0 1 1 0 1 1 0 1
S = x1 x2 x3 k = 2
y1 isin S y2 = x2 y3 isin S(false positive)
bull Query(y) Return True if bits of ~z at h1(y) hk(y) are all 1bull False positive rate and its optimum value have been well studied 7
Our contributions
bull Define adversary models for Bloom filtersbull Query-only adversarybull Chosen-insertion adversarybull Deletion adversary
bull Specific to counting Bloom filters (not covered today)
bull DoS attacks on Bloom enabled software applicationsbull Increase false positive probabilitybull Increase query time
bull Worst-case analysis of Bloom filtersbull false-positive probabilitybull new filter parameters
bull Bloom hash tables as a potential replacement for Bloom filters
8
Query-only adversary
Capabilities Only queries to the filterAssumption State of the filter is known
Goals
bull Craft items that generate false positivesbull Probability to forge a false positive is
(wH (~z)
m
)k
wH(middot) is the Hamming weight
bull Or items whose processing leads to latencybull First k minus 1 bits are set to 1 and the k-th bit set to 0
1 0 1 0 1 1 0 1 1 0 1 0
y k = 3
bull The probability of finding such an item is
(m minus wH(~z)) middot(wH (~z)kminus1
)mk
9
Chosen-insertion adversary
Capabilities Can choose items to insert in the filterAssumption State of the filter is knownGoal Increase the false positive probability
Strategy Greedily insert x that maximizes bits set to 1
bull Each x sets k bits to 1
0 0 0 1 0 1 1 1 1 1 1 1
x1 x2 x3 x4
Impact
No attack Under attackbits set to 1 072nkopt nkopt
false positive rate (f ) 12kopt
(nkoptm
)kopt
10
Impact on a sample filter
Parameters m = 3200 n = 600 kopt = 4 fopt = 0077
0 100 200 300 400 500 600Number of inserted items
False positive probability
0
007
014
021
028
035
f advPartialf
fopt
insert last200 items
11
Applying adversary models
Factors enabling our attacks
bull Insecure hash functions
bull Digest truncation
bull High Bloom filter false positive rate
Vulnerable software applications
Software app Hashing Parameter info
Scrapy Web crawler NA NADablooms Spam filter MurmurHash n = 100000 f = 0057Squid Web proxy MD5 f = 009 k = 4AIEngine NIDS C++ hash ` = 13 n = 5000 f = 045NSRL Forensic tool SHA-1 ` = 32 n asymp 14times 106 f = 808times 10minus10
sdhash Forensic tool SHA-1 ` = 11 n = 128 f = 00014
12
Bypassing a forensic tool
NSRL forensic tool
bull A whitelist of ldquoknown safe filesrdquo
bull Stored and distributed as a Bloom filter
bull Maintained by NIST
A query-only attack Goal is to hide a contraband file
bull Adversary modifies the file to create a false positive
bull Modification should be easily reversible
bull The filter detects the file as safe
13
Countermeasure against chosen-insertion attacks
Use worst-case parameters for Bloom filters
bull Fix m n and choose k that minimizes false positive probability
f adv =
(nk
m
)k
Optimal values are
kadvopt =
m
enand f adv
opt = eminusmen
bull Impact On a sample Bloom filter with m = 3200 n = 600bull Average case kopt = 4 fopt = 0077bull Worst case kadv
opt = 2 f advopt = 01
14
Summary of other attacks amp defenses
Attacks
Software app Attacks
Scrapy Web crawler chosen-insertion query-onlyDablooms Spam filter chosen-insertion deletionSquid Web proxy chosen-insertion query-onlyAIEngine NIDS query-onlysdhash Forensic tool query-only
Defenses
bull Use HMAC
bull Use an alternate data structure Bloom hash tables [Bloom 1970]bull Resists better to chosen-insertion attacksbull Is often more memory efficient than Bloom filtersbull On average O(k) hash computations for items not in the tablebull On average O(ln k) for items in the table
15
Related work
bull Algorithmic complexity attacks [Crosby et al 2003]bull DoS attacks against hash tablesbull Force hash tables to operate in O(n) instead of O(1)bull Similar attacks against skip-lists regular expressions etc
bull Independent work on Bloom filters [Naor et al 2015]bull Provide a theoretical frameworkbull Study a query-only adversary Can only adaptively query the filter
16
Privacy Safe Browsing
Google Safe Browsing in Mozilla Firefox
18
And many others
19
Adverted privacy policy
ldquoWe collect visited web pages clickstream data or web addressaccessed browser identifier and user IDrdquo mdash WOT
ldquocollects information including IP address the origin of thesearch and may share this info with a third partyrdquo mdash Norton
Many Safe Browsing services are privacy unfriendly by design
ldquocannot determine the real URL from the informationreceivedrdquo mdash Google
bull Google seems to provide the most private service
bull Hence focus of this work
20
Google Safe Browsing When Why and How
bull When In 2008 by Google
bull Goals Protect frombull Phishing sitesbull Malware sites
bull How Easy-to-use APIs in C Python and PHP
bull Methodology Blacklists
bull Available in
bull Impactbull Billions of usersbull Detects thousands of new malicious websites per day
bull Cloned by Yandex as Yandex Safe Browsing 21
Lookup API
bull Google harvests phishing and malware URLs to feed a blacklist
bull Client checks the status using a simple HTTP GETPOST requestsb-sslgooglecomsafebrowsingapilookupexamplecom
Issues
bull Does not scale Heavy network traffic
bull Privacy URLs are sent in clear
22
Improving privacy using a local cache
Database
Server ClientLocal cache
(3) Conditionalquery
(4) Answer
(1) Query
(2) Answer
Extended client
bull Communication with the server is reduced
bull Better privacy
23
Google Safe Browsing API (v3) Local cache
bull BlacklistsList name Description Entries
goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1
bull Does not handle URLs directly instead their SHA-256 digests
wwwevilcom SHA-256 cc7af8a31918
cc7af8a31918
32-bit prefix
bull Local cache contains prefixes 24
Clientrsquos behavior chart
Userrsquos input
URLCanonicalize andcompute digest
Foundprefix
Get fulldigests
Founddigest
MaliciousURL
Non-maliciousURL
yes
no
yes
no
25
Canonicalization and decompositions
bull Input URLhttpusrpwdabcport12extparam=1frags
bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password
bull Multiple decompositions are checked for a single URL
Decompositions of canonicalized URL
1 abc12extparam=12 abc12ext3 abc14 abc
5 bc12extparam=16 bc12ext7 bc18 bc
bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious
26
Purpose of computing decompositions
Memory saving
bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix
A more intricate example
Unsafe link Safe link
dbcebc
bcabc2
abc1abc
bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc
27
Privacy of Google Safe Browsing
ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy
Our goal A privacy analysis of Google and Yandex Safe Browsing
URL Prefix
wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3
bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set
28
Tracking Safe Browsing users
Our assumptions
bull Google and Yandex have incentives to behave maliciously
bull Wish to learn whether a user visits some selected URLs
How Google can track Safe Browsing users
bull Builds a list of prefixes to track
bull Includes these prefixes in the clientrsquos local cache
bull Learns from the requests whether a user visited a specific URL
bull Key parameter Anonymity-set size
29
Estimating anonymity-set size
bull Anonymity-set size of a prefix URLs that yield the prefix
Year URLs Domains
2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million
bull Estimate anonymity-set size Apply balls-into-bins model
Avg for URLs Avg for Domains
Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast
0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large
30
Sending multiple prefixes
Example with two prefixes
Decomposition Prefix
petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
Intuitively
bull Prefix for petsymposiumorg is not enough for re-identification
bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix
bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification
31
Ambiguity on two prefixes
bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache
URL Decomposition Prefix
Target URL abcabc A
bc B
Ambiguity
Type I gabcgabc C
abc A
bc B
Type II gbcgbc A larr collision on 32 bits
bc B
Type III defdef A larr collision on 32 bits
ef B larr collision on 32 bits
bull P[Type III] = 1264
bull Type II URLs exist only when decomp on the domain gt 232
bull P[Type I] gt P[Type II] gt P[Type III]
bull Mainly only Type I URLs create ambiguity in re-identification 32
How to track a given URL A real-world example (I)
petsymposiumorg2015
petsymposiumorg
petsymposiumorg2016linksphp
petsymposiumorg2016cfpphp
petsymposiumorg2016faqsphp
Type I URLs for target URL
petsymposiumorg2016
Goal Identify users interested in PETs
bull Target URL is petsymposiumorg2016
33
How to track a given URL A real-world example (II)
bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp
Decomposition Prefix
petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
bull Including 2 prefixes in the local cache rArr Anonymity set size of 4
bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes
bull Server receives 2 prefixes rArr visited page is the target URL
bull Server receives 3 prefixes rArr visited page is either of the leaf URLs
bull The third prefix decides which leaf URL was visited
bull Generalizable to any number of prefixes 34
Examples of URLs creating multiple hits
bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing
URL matching decomposition
Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf
17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp
httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues
1001cartesorgtag
Yandex
httpfrxhamstercomuservideofrxhamstercomxhamstercom
httpnlxhamstercomuservideonlxhamstercomxhamstercom
httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom
httpmmofoscomuserloginmmofoscommofoscom
httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom
teenslovehugecockscom
bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom
bull No need to add additional prefix for French or Dutch version35
Responsible disclosure and impact
bull Disclosure to Mozilla Firefox
ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox
bull Disclosure to Yandex
ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team
bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)
ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy
36
Conclusions amp Future Work
Conclusions
Lesson learnt Collisions are hard to tame in security and privacy
Bloom filters
bull Developers tend to ignore the worst-case of algorithms
bull Data structures with ad-hoc crypto primitives are at the best risky
bull Need of secure instantiations eg as in Count-Min sketches
Safe Browsing
bull Re-establish the weakness of anonymity-set privacy model
bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so
38
Future work
Bloom filters
bull On Bloom filters Bloom paradox [Rottenstreich 2015]
bull Beyond Bloom filters Security of Bloom filter variants
Safe Browsing
bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]
bull Privacy Can local cache improve Private Information Retrieval
39
Thank you
40
Other works not covered today
bull Performance of cryptographic accumulators
bull Private password auditing
bull (In)Security of Google and Yandex Safe Browsing
bull Alerting websites Risks and solutions
bull Decompression quines and anti-viruses
bull Pitfalls of hashing for privacy
bull Linkable (zero-knowledge) proofs for private and accountable gossip
41
Hashing
bull A function h 0 1lowast rarr 0 1` where ` is the digest size
bull Cryptographic (second) pre-image and collision resistant
h
h(x)
pre-imageresistance
h
x
h(x)
2nd pre-imageresistance
h
h(x prime)
6=
=
h
h(x)
collisionresistance
h
h(x prime)
6=
=
2` 2` 2`2Best genericattack
2
Are collisions always bad (I)
A simple use case
bull Instead of storing n (large) data items store their digestsbull If ` is large collisions are hard to find rArr space required = ntimes ` bits
Collisions for further space savings
x1
x2
x3
xn
d1
di
dm
Data Digests
bull di now substitutes both x1 and x2 rArr space required lt n times ` bitsbull Caveat May introduce some unexpected behaviorbull Core of several efficient (probabilistic) data structures
bull Bloom filters for membership testingbull Sketches for data stream analysis 3
Are collisions always bad (II)
Use case in privacy
bull Hashing as a pseudonymization technique
bull If ` is large but identifiers n is enumerable (in reasonable time)bull Exhaustive search breaks pseudonymization
d
d1
d2
` is large ` is small
h
h
h
h
bull If ` is sufficiently smallbull On average n2` identifiers share the same pseudonymbull Notion of anonymity-setbull Caveat Provides weak anonymity guaranteesbull Employed in Google Safe Browsing a malicious URL detection tool 4
Contrasting perspectives and outline
Contrasting perspectives
bull Collisions have to be absolutely avoided in cryptography
bull Somewhat welcome in algorithms and data structures
bull Useful to some extent in the context of privacy
Goal Investigate the security and privacy implications of hash collisions
Focus for today
bull Security Bloom Filters1 The Power of Evil Choices in Bloom Filters DSNrsquo15
Joint work with T Gerbet and C Lauradoux2 Bloom Filters in Adversarial Settings Under submission
Joint work with C Lauradoux and P Lafourcade
bull Privacy Safe Browsing1 A Privacy Analysis of Google and Yandex Safe Browsing DSNrsquo16
Joint work with T Gerbet and C Lauradoux
5
Security Bloom Filters
Bloom filters [Bloom 1970]
Setup(m n k)
bull A binary vector ~z of size m compressing a set of n itemsbull k uniform and independent hash functions hi 0 1lowast rarr [0m minus 1]bull ~z initialized to ~0
Operations
bull Insert(x) Set bits of ~z at h1(x) hk(x) to 1
0 0 0 1 1 0 1 1 0 1
S = x1 x2 x3 k = 2
y1 isin S y2 = x2 y3 isin S(false positive)
bull Query(y) Return True if bits of ~z at h1(y) hk(y) are all 1bull False positive rate and its optimum value have been well studied 7
Our contributions
bull Define adversary models for Bloom filtersbull Query-only adversarybull Chosen-insertion adversarybull Deletion adversary
bull Specific to counting Bloom filters (not covered today)
bull DoS attacks on Bloom enabled software applicationsbull Increase false positive probabilitybull Increase query time
bull Worst-case analysis of Bloom filtersbull false-positive probabilitybull new filter parameters
bull Bloom hash tables as a potential replacement for Bloom filters
8
Query-only adversary
Capabilities Only queries to the filterAssumption State of the filter is known
Goals
bull Craft items that generate false positivesbull Probability to forge a false positive is
(wH (~z)
m
)k
wH(middot) is the Hamming weight
bull Or items whose processing leads to latencybull First k minus 1 bits are set to 1 and the k-th bit set to 0
1 0 1 0 1 1 0 1 1 0 1 0
y k = 3
bull The probability of finding such an item is
(m minus wH(~z)) middot(wH (~z)kminus1
)mk
9
Chosen-insertion adversary
Capabilities Can choose items to insert in the filterAssumption State of the filter is knownGoal Increase the false positive probability
Strategy Greedily insert x that maximizes bits set to 1
bull Each x sets k bits to 1
0 0 0 1 0 1 1 1 1 1 1 1
x1 x2 x3 x4
Impact
No attack Under attackbits set to 1 072nkopt nkopt
false positive rate (f ) 12kopt
(nkoptm
)kopt
10
Impact on a sample filter
Parameters m = 3200 n = 600 kopt = 4 fopt = 0077
0 100 200 300 400 500 600Number of inserted items
False positive probability
0
007
014
021
028
035
f advPartialf
fopt
insert last200 items
11
Applying adversary models
Factors enabling our attacks
bull Insecure hash functions
bull Digest truncation
bull High Bloom filter false positive rate
Vulnerable software applications
Software app Hashing Parameter info
Scrapy Web crawler NA NADablooms Spam filter MurmurHash n = 100000 f = 0057Squid Web proxy MD5 f = 009 k = 4AIEngine NIDS C++ hash ` = 13 n = 5000 f = 045NSRL Forensic tool SHA-1 ` = 32 n asymp 14times 106 f = 808times 10minus10
sdhash Forensic tool SHA-1 ` = 11 n = 128 f = 00014
12
Bypassing a forensic tool
NSRL forensic tool
bull A whitelist of ldquoknown safe filesrdquo
bull Stored and distributed as a Bloom filter
bull Maintained by NIST
A query-only attack Goal is to hide a contraband file
bull Adversary modifies the file to create a false positive
bull Modification should be easily reversible
bull The filter detects the file as safe
13
Countermeasure against chosen-insertion attacks
Use worst-case parameters for Bloom filters
bull Fix m n and choose k that minimizes false positive probability
f adv =
(nk
m
)k
Optimal values are
kadvopt =
m
enand f adv
opt = eminusmen
bull Impact On a sample Bloom filter with m = 3200 n = 600bull Average case kopt = 4 fopt = 0077bull Worst case kadv
opt = 2 f advopt = 01
14
Summary of other attacks amp defenses
Attacks
Software app Attacks
Scrapy Web crawler chosen-insertion query-onlyDablooms Spam filter chosen-insertion deletionSquid Web proxy chosen-insertion query-onlyAIEngine NIDS query-onlysdhash Forensic tool query-only
Defenses
bull Use HMAC
bull Use an alternate data structure Bloom hash tables [Bloom 1970]bull Resists better to chosen-insertion attacksbull Is often more memory efficient than Bloom filtersbull On average O(k) hash computations for items not in the tablebull On average O(ln k) for items in the table
15
Related work
bull Algorithmic complexity attacks [Crosby et al 2003]bull DoS attacks against hash tablesbull Force hash tables to operate in O(n) instead of O(1)bull Similar attacks against skip-lists regular expressions etc
bull Independent work on Bloom filters [Naor et al 2015]bull Provide a theoretical frameworkbull Study a query-only adversary Can only adaptively query the filter
16
Privacy Safe Browsing
Google Safe Browsing in Mozilla Firefox
18
And many others
19
Adverted privacy policy
ldquoWe collect visited web pages clickstream data or web addressaccessed browser identifier and user IDrdquo mdash WOT
ldquocollects information including IP address the origin of thesearch and may share this info with a third partyrdquo mdash Norton
Many Safe Browsing services are privacy unfriendly by design
ldquocannot determine the real URL from the informationreceivedrdquo mdash Google
bull Google seems to provide the most private service
bull Hence focus of this work
20
Google Safe Browsing When Why and How
bull When In 2008 by Google
bull Goals Protect frombull Phishing sitesbull Malware sites
bull How Easy-to-use APIs in C Python and PHP
bull Methodology Blacklists
bull Available in
bull Impactbull Billions of usersbull Detects thousands of new malicious websites per day
bull Cloned by Yandex as Yandex Safe Browsing 21
Lookup API
bull Google harvests phishing and malware URLs to feed a blacklist
bull Client checks the status using a simple HTTP GETPOST requestsb-sslgooglecomsafebrowsingapilookupexamplecom
Issues
bull Does not scale Heavy network traffic
bull Privacy URLs are sent in clear
22
Improving privacy using a local cache
Database
Server ClientLocal cache
(3) Conditionalquery
(4) Answer
(1) Query
(2) Answer
Extended client
bull Communication with the server is reduced
bull Better privacy
23
Google Safe Browsing API (v3) Local cache
bull BlacklistsList name Description Entries
goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1
bull Does not handle URLs directly instead their SHA-256 digests
wwwevilcom SHA-256 cc7af8a31918
cc7af8a31918
32-bit prefix
bull Local cache contains prefixes 24
Clientrsquos behavior chart
Userrsquos input
URLCanonicalize andcompute digest
Foundprefix
Get fulldigests
Founddigest
MaliciousURL
Non-maliciousURL
yes
no
yes
no
25
Canonicalization and decompositions
bull Input URLhttpusrpwdabcport12extparam=1frags
bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password
bull Multiple decompositions are checked for a single URL
Decompositions of canonicalized URL
1 abc12extparam=12 abc12ext3 abc14 abc
5 bc12extparam=16 bc12ext7 bc18 bc
bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious
26
Purpose of computing decompositions
Memory saving
bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix
A more intricate example
Unsafe link Safe link
dbcebc
bcabc2
abc1abc
bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc
27
Privacy of Google Safe Browsing
ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy
Our goal A privacy analysis of Google and Yandex Safe Browsing
URL Prefix
wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3
bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set
28
Tracking Safe Browsing users
Our assumptions
bull Google and Yandex have incentives to behave maliciously
bull Wish to learn whether a user visits some selected URLs
How Google can track Safe Browsing users
bull Builds a list of prefixes to track
bull Includes these prefixes in the clientrsquos local cache
bull Learns from the requests whether a user visited a specific URL
bull Key parameter Anonymity-set size
29
Estimating anonymity-set size
bull Anonymity-set size of a prefix URLs that yield the prefix
Year URLs Domains
2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million
bull Estimate anonymity-set size Apply balls-into-bins model
Avg for URLs Avg for Domains
Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast
0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large
30
Sending multiple prefixes
Example with two prefixes
Decomposition Prefix
petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
Intuitively
bull Prefix for petsymposiumorg is not enough for re-identification
bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix
bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification
31
Ambiguity on two prefixes
bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache
URL Decomposition Prefix
Target URL abcabc A
bc B
Ambiguity
Type I gabcgabc C
abc A
bc B
Type II gbcgbc A larr collision on 32 bits
bc B
Type III defdef A larr collision on 32 bits
ef B larr collision on 32 bits
bull P[Type III] = 1264
bull Type II URLs exist only when decomp on the domain gt 232
bull P[Type I] gt P[Type II] gt P[Type III]
bull Mainly only Type I URLs create ambiguity in re-identification 32
How to track a given URL A real-world example (I)
petsymposiumorg2015
petsymposiumorg
petsymposiumorg2016linksphp
petsymposiumorg2016cfpphp
petsymposiumorg2016faqsphp
Type I URLs for target URL
petsymposiumorg2016
Goal Identify users interested in PETs
bull Target URL is petsymposiumorg2016
33
How to track a given URL A real-world example (II)
bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp
Decomposition Prefix
petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
bull Including 2 prefixes in the local cache rArr Anonymity set size of 4
bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes
bull Server receives 2 prefixes rArr visited page is the target URL
bull Server receives 3 prefixes rArr visited page is either of the leaf URLs
bull The third prefix decides which leaf URL was visited
bull Generalizable to any number of prefixes 34
Examples of URLs creating multiple hits
bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing
URL matching decomposition
Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf
17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp
httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues
1001cartesorgtag
Yandex
httpfrxhamstercomuservideofrxhamstercomxhamstercom
httpnlxhamstercomuservideonlxhamstercomxhamstercom
httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom
httpmmofoscomuserloginmmofoscommofoscom
httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom
teenslovehugecockscom
bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom
bull No need to add additional prefix for French or Dutch version35
Responsible disclosure and impact
bull Disclosure to Mozilla Firefox
ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox
bull Disclosure to Yandex
ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team
bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)
ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy
36
Conclusions amp Future Work
Conclusions
Lesson learnt Collisions are hard to tame in security and privacy
Bloom filters
bull Developers tend to ignore the worst-case of algorithms
bull Data structures with ad-hoc crypto primitives are at the best risky
bull Need of secure instantiations eg as in Count-Min sketches
Safe Browsing
bull Re-establish the weakness of anonymity-set privacy model
bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so
38
Future work
Bloom filters
bull On Bloom filters Bloom paradox [Rottenstreich 2015]
bull Beyond Bloom filters Security of Bloom filter variants
Safe Browsing
bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]
bull Privacy Can local cache improve Private Information Retrieval
39
Thank you
40
Other works not covered today
bull Performance of cryptographic accumulators
bull Private password auditing
bull (In)Security of Google and Yandex Safe Browsing
bull Alerting websites Risks and solutions
bull Decompression quines and anti-viruses
bull Pitfalls of hashing for privacy
bull Linkable (zero-knowledge) proofs for private and accountable gossip
41
Are collisions always bad (I)
A simple use case
bull Instead of storing n (large) data items store their digestsbull If ` is large collisions are hard to find rArr space required = ntimes ` bits
Collisions for further space savings
x1
x2
x3
xn
d1
di
dm
Data Digests
bull di now substitutes both x1 and x2 rArr space required lt n times ` bitsbull Caveat May introduce some unexpected behaviorbull Core of several efficient (probabilistic) data structures
bull Bloom filters for membership testingbull Sketches for data stream analysis 3
Are collisions always bad (II)
Use case in privacy
bull Hashing as a pseudonymization technique
bull If ` is large but identifiers n is enumerable (in reasonable time)bull Exhaustive search breaks pseudonymization
d
d1
d2
` is large ` is small
h
h
h
h
bull If ` is sufficiently smallbull On average n2` identifiers share the same pseudonymbull Notion of anonymity-setbull Caveat Provides weak anonymity guaranteesbull Employed in Google Safe Browsing a malicious URL detection tool 4
Contrasting perspectives and outline
Contrasting perspectives
bull Collisions have to be absolutely avoided in cryptography
bull Somewhat welcome in algorithms and data structures
bull Useful to some extent in the context of privacy
Goal Investigate the security and privacy implications of hash collisions
Focus for today
bull Security Bloom Filters1 The Power of Evil Choices in Bloom Filters DSNrsquo15
Joint work with T Gerbet and C Lauradoux2 Bloom Filters in Adversarial Settings Under submission
Joint work with C Lauradoux and P Lafourcade
bull Privacy Safe Browsing1 A Privacy Analysis of Google and Yandex Safe Browsing DSNrsquo16
Joint work with T Gerbet and C Lauradoux
5
Security Bloom Filters
Bloom filters [Bloom 1970]
Setup(m n k)
bull A binary vector ~z of size m compressing a set of n itemsbull k uniform and independent hash functions hi 0 1lowast rarr [0m minus 1]bull ~z initialized to ~0
Operations
bull Insert(x) Set bits of ~z at h1(x) hk(x) to 1
0 0 0 1 1 0 1 1 0 1
S = x1 x2 x3 k = 2
y1 isin S y2 = x2 y3 isin S(false positive)
bull Query(y) Return True if bits of ~z at h1(y) hk(y) are all 1bull False positive rate and its optimum value have been well studied 7
Our contributions
bull Define adversary models for Bloom filtersbull Query-only adversarybull Chosen-insertion adversarybull Deletion adversary
bull Specific to counting Bloom filters (not covered today)
bull DoS attacks on Bloom enabled software applicationsbull Increase false positive probabilitybull Increase query time
bull Worst-case analysis of Bloom filtersbull false-positive probabilitybull new filter parameters
bull Bloom hash tables as a potential replacement for Bloom filters
8
Query-only adversary
Capabilities Only queries to the filterAssumption State of the filter is known
Goals
bull Craft items that generate false positivesbull Probability to forge a false positive is
(wH (~z)
m
)k
wH(middot) is the Hamming weight
bull Or items whose processing leads to latencybull First k minus 1 bits are set to 1 and the k-th bit set to 0
1 0 1 0 1 1 0 1 1 0 1 0
y k = 3
bull The probability of finding such an item is
(m minus wH(~z)) middot(wH (~z)kminus1
)mk
9
Chosen-insertion adversary
Capabilities Can choose items to insert in the filterAssumption State of the filter is knownGoal Increase the false positive probability
Strategy Greedily insert x that maximizes bits set to 1
bull Each x sets k bits to 1
0 0 0 1 0 1 1 1 1 1 1 1
x1 x2 x3 x4
Impact
No attack Under attackbits set to 1 072nkopt nkopt
false positive rate (f ) 12kopt
(nkoptm
)kopt
10
Impact on a sample filter
Parameters m = 3200 n = 600 kopt = 4 fopt = 0077
0 100 200 300 400 500 600Number of inserted items
False positive probability
0
007
014
021
028
035
f advPartialf
fopt
insert last200 items
11
Applying adversary models
Factors enabling our attacks
bull Insecure hash functions
bull Digest truncation
bull High Bloom filter false positive rate
Vulnerable software applications
Software app Hashing Parameter info
Scrapy Web crawler NA NADablooms Spam filter MurmurHash n = 100000 f = 0057Squid Web proxy MD5 f = 009 k = 4AIEngine NIDS C++ hash ` = 13 n = 5000 f = 045NSRL Forensic tool SHA-1 ` = 32 n asymp 14times 106 f = 808times 10minus10
sdhash Forensic tool SHA-1 ` = 11 n = 128 f = 00014
12
Bypassing a forensic tool
NSRL forensic tool
bull A whitelist of ldquoknown safe filesrdquo
bull Stored and distributed as a Bloom filter
bull Maintained by NIST
A query-only attack Goal is to hide a contraband file
bull Adversary modifies the file to create a false positive
bull Modification should be easily reversible
bull The filter detects the file as safe
13
Countermeasure against chosen-insertion attacks
Use worst-case parameters for Bloom filters
bull Fix m n and choose k that minimizes false positive probability
f adv =
(nk
m
)k
Optimal values are
kadvopt =
m
enand f adv
opt = eminusmen
bull Impact On a sample Bloom filter with m = 3200 n = 600bull Average case kopt = 4 fopt = 0077bull Worst case kadv
opt = 2 f advopt = 01
14
Summary of other attacks amp defenses
Attacks
Software app Attacks
Scrapy Web crawler chosen-insertion query-onlyDablooms Spam filter chosen-insertion deletionSquid Web proxy chosen-insertion query-onlyAIEngine NIDS query-onlysdhash Forensic tool query-only
Defenses
bull Use HMAC
bull Use an alternate data structure Bloom hash tables [Bloom 1970]bull Resists better to chosen-insertion attacksbull Is often more memory efficient than Bloom filtersbull On average O(k) hash computations for items not in the tablebull On average O(ln k) for items in the table
15
Related work
bull Algorithmic complexity attacks [Crosby et al 2003]bull DoS attacks against hash tablesbull Force hash tables to operate in O(n) instead of O(1)bull Similar attacks against skip-lists regular expressions etc
bull Independent work on Bloom filters [Naor et al 2015]bull Provide a theoretical frameworkbull Study a query-only adversary Can only adaptively query the filter
16
Privacy Safe Browsing
Google Safe Browsing in Mozilla Firefox
18
And many others
19
Adverted privacy policy
ldquoWe collect visited web pages clickstream data or web addressaccessed browser identifier and user IDrdquo mdash WOT
ldquocollects information including IP address the origin of thesearch and may share this info with a third partyrdquo mdash Norton
Many Safe Browsing services are privacy unfriendly by design
ldquocannot determine the real URL from the informationreceivedrdquo mdash Google
bull Google seems to provide the most private service
bull Hence focus of this work
20
Google Safe Browsing When Why and How
bull When In 2008 by Google
bull Goals Protect frombull Phishing sitesbull Malware sites
bull How Easy-to-use APIs in C Python and PHP
bull Methodology Blacklists
bull Available in
bull Impactbull Billions of usersbull Detects thousands of new malicious websites per day
bull Cloned by Yandex as Yandex Safe Browsing 21
Lookup API
bull Google harvests phishing and malware URLs to feed a blacklist
bull Client checks the status using a simple HTTP GETPOST requestsb-sslgooglecomsafebrowsingapilookupexamplecom
Issues
bull Does not scale Heavy network traffic
bull Privacy URLs are sent in clear
22
Improving privacy using a local cache
Database
Server ClientLocal cache
(3) Conditionalquery
(4) Answer
(1) Query
(2) Answer
Extended client
bull Communication with the server is reduced
bull Better privacy
23
Google Safe Browsing API (v3) Local cache
bull BlacklistsList name Description Entries
goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1
bull Does not handle URLs directly instead their SHA-256 digests
wwwevilcom SHA-256 cc7af8a31918
cc7af8a31918
32-bit prefix
bull Local cache contains prefixes 24
Clientrsquos behavior chart
Userrsquos input
URLCanonicalize andcompute digest
Foundprefix
Get fulldigests
Founddigest
MaliciousURL
Non-maliciousURL
yes
no
yes
no
25
Canonicalization and decompositions
bull Input URLhttpusrpwdabcport12extparam=1frags
bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password
bull Multiple decompositions are checked for a single URL
Decompositions of canonicalized URL
1 abc12extparam=12 abc12ext3 abc14 abc
5 bc12extparam=16 bc12ext7 bc18 bc
bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious
26
Purpose of computing decompositions
Memory saving
bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix
A more intricate example
Unsafe link Safe link
dbcebc
bcabc2
abc1abc
bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc
27
Privacy of Google Safe Browsing
ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy
Our goal A privacy analysis of Google and Yandex Safe Browsing
URL Prefix
wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3
bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set
28
Tracking Safe Browsing users
Our assumptions
bull Google and Yandex have incentives to behave maliciously
bull Wish to learn whether a user visits some selected URLs
How Google can track Safe Browsing users
bull Builds a list of prefixes to track
bull Includes these prefixes in the clientrsquos local cache
bull Learns from the requests whether a user visited a specific URL
bull Key parameter Anonymity-set size
29
Estimating anonymity-set size
bull Anonymity-set size of a prefix URLs that yield the prefix
Year URLs Domains
2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million
bull Estimate anonymity-set size Apply balls-into-bins model
Avg for URLs Avg for Domains
Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast
0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large
30
Sending multiple prefixes
Example with two prefixes
Decomposition Prefix
petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
Intuitively
bull Prefix for petsymposiumorg is not enough for re-identification
bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix
bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification
31
Ambiguity on two prefixes
bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache
URL Decomposition Prefix
Target URL abcabc A
bc B
Ambiguity
Type I gabcgabc C
abc A
bc B
Type II gbcgbc A larr collision on 32 bits
bc B
Type III defdef A larr collision on 32 bits
ef B larr collision on 32 bits
bull P[Type III] = 1264
bull Type II URLs exist only when decomp on the domain gt 232
bull P[Type I] gt P[Type II] gt P[Type III]
bull Mainly only Type I URLs create ambiguity in re-identification 32
How to track a given URL A real-world example (I)
petsymposiumorg2015
petsymposiumorg
petsymposiumorg2016linksphp
petsymposiumorg2016cfpphp
petsymposiumorg2016faqsphp
Type I URLs for target URL
petsymposiumorg2016
Goal Identify users interested in PETs
bull Target URL is petsymposiumorg2016
33
How to track a given URL A real-world example (II)
bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp
Decomposition Prefix
petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
bull Including 2 prefixes in the local cache rArr Anonymity set size of 4
bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes
bull Server receives 2 prefixes rArr visited page is the target URL
bull Server receives 3 prefixes rArr visited page is either of the leaf URLs
bull The third prefix decides which leaf URL was visited
bull Generalizable to any number of prefixes 34
Examples of URLs creating multiple hits
bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing
URL matching decomposition
Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf
17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp
httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues
1001cartesorgtag
Yandex
httpfrxhamstercomuservideofrxhamstercomxhamstercom
httpnlxhamstercomuservideonlxhamstercomxhamstercom
httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom
httpmmofoscomuserloginmmofoscommofoscom
httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom
teenslovehugecockscom
bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom
bull No need to add additional prefix for French or Dutch version35
Responsible disclosure and impact
bull Disclosure to Mozilla Firefox
ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox
bull Disclosure to Yandex
ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team
bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)
ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy
36
Conclusions amp Future Work
Conclusions
Lesson learnt Collisions are hard to tame in security and privacy
Bloom filters
bull Developers tend to ignore the worst-case of algorithms
bull Data structures with ad-hoc crypto primitives are at the best risky
bull Need of secure instantiations eg as in Count-Min sketches
Safe Browsing
bull Re-establish the weakness of anonymity-set privacy model
bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so
38
Future work
Bloom filters
bull On Bloom filters Bloom paradox [Rottenstreich 2015]
bull Beyond Bloom filters Security of Bloom filter variants
Safe Browsing
bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]
bull Privacy Can local cache improve Private Information Retrieval
39
Thank you
40
Other works not covered today
bull Performance of cryptographic accumulators
bull Private password auditing
bull (In)Security of Google and Yandex Safe Browsing
bull Alerting websites Risks and solutions
bull Decompression quines and anti-viruses
bull Pitfalls of hashing for privacy
bull Linkable (zero-knowledge) proofs for private and accountable gossip
41
Are collisions always bad (II)
Use case in privacy
bull Hashing as a pseudonymization technique
bull If ` is large but identifiers n is enumerable (in reasonable time)bull Exhaustive search breaks pseudonymization
d
d1
d2
` is large ` is small
h
h
h
h
bull If ` is sufficiently smallbull On average n2` identifiers share the same pseudonymbull Notion of anonymity-setbull Caveat Provides weak anonymity guaranteesbull Employed in Google Safe Browsing a malicious URL detection tool 4
Contrasting perspectives and outline
Contrasting perspectives
bull Collisions have to be absolutely avoided in cryptography
bull Somewhat welcome in algorithms and data structures
bull Useful to some extent in the context of privacy
Goal Investigate the security and privacy implications of hash collisions
Focus for today
bull Security Bloom Filters1 The Power of Evil Choices in Bloom Filters DSNrsquo15
Joint work with T Gerbet and C Lauradoux2 Bloom Filters in Adversarial Settings Under submission
Joint work with C Lauradoux and P Lafourcade
bull Privacy Safe Browsing1 A Privacy Analysis of Google and Yandex Safe Browsing DSNrsquo16
Joint work with T Gerbet and C Lauradoux
5
Security Bloom Filters
Bloom filters [Bloom 1970]
Setup(m n k)
bull A binary vector ~z of size m compressing a set of n itemsbull k uniform and independent hash functions hi 0 1lowast rarr [0m minus 1]bull ~z initialized to ~0
Operations
bull Insert(x) Set bits of ~z at h1(x) hk(x) to 1
0 0 0 1 1 0 1 1 0 1
S = x1 x2 x3 k = 2
y1 isin S y2 = x2 y3 isin S(false positive)
bull Query(y) Return True if bits of ~z at h1(y) hk(y) are all 1bull False positive rate and its optimum value have been well studied 7
Our contributions
bull Define adversary models for Bloom filtersbull Query-only adversarybull Chosen-insertion adversarybull Deletion adversary
bull Specific to counting Bloom filters (not covered today)
bull DoS attacks on Bloom enabled software applicationsbull Increase false positive probabilitybull Increase query time
bull Worst-case analysis of Bloom filtersbull false-positive probabilitybull new filter parameters
bull Bloom hash tables as a potential replacement for Bloom filters
8
Query-only adversary
Capabilities Only queries to the filterAssumption State of the filter is known
Goals
bull Craft items that generate false positivesbull Probability to forge a false positive is
(wH (~z)
m
)k
wH(middot) is the Hamming weight
bull Or items whose processing leads to latencybull First k minus 1 bits are set to 1 and the k-th bit set to 0
1 0 1 0 1 1 0 1 1 0 1 0
y k = 3
bull The probability of finding such an item is
(m minus wH(~z)) middot(wH (~z)kminus1
)mk
9
Chosen-insertion adversary
Capabilities Can choose items to insert in the filterAssumption State of the filter is knownGoal Increase the false positive probability
Strategy Greedily insert x that maximizes bits set to 1
bull Each x sets k bits to 1
0 0 0 1 0 1 1 1 1 1 1 1
x1 x2 x3 x4
Impact
No attack Under attackbits set to 1 072nkopt nkopt
false positive rate (f ) 12kopt
(nkoptm
)kopt
10
Impact on a sample filter
Parameters m = 3200 n = 600 kopt = 4 fopt = 0077
0 100 200 300 400 500 600Number of inserted items
False positive probability
0
007
014
021
028
035
f advPartialf
fopt
insert last200 items
11
Applying adversary models
Factors enabling our attacks
bull Insecure hash functions
bull Digest truncation
bull High Bloom filter false positive rate
Vulnerable software applications
Software app Hashing Parameter info
Scrapy Web crawler NA NADablooms Spam filter MurmurHash n = 100000 f = 0057Squid Web proxy MD5 f = 009 k = 4AIEngine NIDS C++ hash ` = 13 n = 5000 f = 045NSRL Forensic tool SHA-1 ` = 32 n asymp 14times 106 f = 808times 10minus10
sdhash Forensic tool SHA-1 ` = 11 n = 128 f = 00014
12
Bypassing a forensic tool
NSRL forensic tool
bull A whitelist of ldquoknown safe filesrdquo
bull Stored and distributed as a Bloom filter
bull Maintained by NIST
A query-only attack Goal is to hide a contraband file
bull Adversary modifies the file to create a false positive
bull Modification should be easily reversible
bull The filter detects the file as safe
13
Countermeasure against chosen-insertion attacks
Use worst-case parameters for Bloom filters
bull Fix m n and choose k that minimizes false positive probability
f adv =
(nk
m
)k
Optimal values are
kadvopt =
m
enand f adv
opt = eminusmen
bull Impact On a sample Bloom filter with m = 3200 n = 600bull Average case kopt = 4 fopt = 0077bull Worst case kadv
opt = 2 f advopt = 01
14
Summary of other attacks amp defenses
Attacks
Software app Attacks
Scrapy Web crawler chosen-insertion query-onlyDablooms Spam filter chosen-insertion deletionSquid Web proxy chosen-insertion query-onlyAIEngine NIDS query-onlysdhash Forensic tool query-only
Defenses
bull Use HMAC
bull Use an alternate data structure Bloom hash tables [Bloom 1970]bull Resists better to chosen-insertion attacksbull Is often more memory efficient than Bloom filtersbull On average O(k) hash computations for items not in the tablebull On average O(ln k) for items in the table
15
Related work
bull Algorithmic complexity attacks [Crosby et al 2003]bull DoS attacks against hash tablesbull Force hash tables to operate in O(n) instead of O(1)bull Similar attacks against skip-lists regular expressions etc
bull Independent work on Bloom filters [Naor et al 2015]bull Provide a theoretical frameworkbull Study a query-only adversary Can only adaptively query the filter
16
Privacy Safe Browsing
Google Safe Browsing in Mozilla Firefox
18
And many others
19
Adverted privacy policy
ldquoWe collect visited web pages clickstream data or web addressaccessed browser identifier and user IDrdquo mdash WOT
ldquocollects information including IP address the origin of thesearch and may share this info with a third partyrdquo mdash Norton
Many Safe Browsing services are privacy unfriendly by design
ldquocannot determine the real URL from the informationreceivedrdquo mdash Google
bull Google seems to provide the most private service
bull Hence focus of this work
20
Google Safe Browsing When Why and How
bull When In 2008 by Google
bull Goals Protect frombull Phishing sitesbull Malware sites
bull How Easy-to-use APIs in C Python and PHP
bull Methodology Blacklists
bull Available in
bull Impactbull Billions of usersbull Detects thousands of new malicious websites per day
bull Cloned by Yandex as Yandex Safe Browsing 21
Lookup API
bull Google harvests phishing and malware URLs to feed a blacklist
bull Client checks the status using a simple HTTP GETPOST requestsb-sslgooglecomsafebrowsingapilookupexamplecom
Issues
bull Does not scale Heavy network traffic
bull Privacy URLs are sent in clear
22
Improving privacy using a local cache
Database
Server ClientLocal cache
(3) Conditionalquery
(4) Answer
(1) Query
(2) Answer
Extended client
bull Communication with the server is reduced
bull Better privacy
23
Google Safe Browsing API (v3) Local cache
bull BlacklistsList name Description Entries
goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1
bull Does not handle URLs directly instead their SHA-256 digests
wwwevilcom SHA-256 cc7af8a31918
cc7af8a31918
32-bit prefix
bull Local cache contains prefixes 24
Clientrsquos behavior chart
Userrsquos input
URLCanonicalize andcompute digest
Foundprefix
Get fulldigests
Founddigest
MaliciousURL
Non-maliciousURL
yes
no
yes
no
25
Canonicalization and decompositions
bull Input URLhttpusrpwdabcport12extparam=1frags
bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password
bull Multiple decompositions are checked for a single URL
Decompositions of canonicalized URL
1 abc12extparam=12 abc12ext3 abc14 abc
5 bc12extparam=16 bc12ext7 bc18 bc
bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious
26
Purpose of computing decompositions
Memory saving
bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix
A more intricate example
Unsafe link Safe link
dbcebc
bcabc2
abc1abc
bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc
27
Privacy of Google Safe Browsing
ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy
Our goal A privacy analysis of Google and Yandex Safe Browsing
URL Prefix
wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3
bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set
28
Tracking Safe Browsing users
Our assumptions
bull Google and Yandex have incentives to behave maliciously
bull Wish to learn whether a user visits some selected URLs
How Google can track Safe Browsing users
bull Builds a list of prefixes to track
bull Includes these prefixes in the clientrsquos local cache
bull Learns from the requests whether a user visited a specific URL
bull Key parameter Anonymity-set size
29
Estimating anonymity-set size
bull Anonymity-set size of a prefix URLs that yield the prefix
Year URLs Domains
2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million
bull Estimate anonymity-set size Apply balls-into-bins model
Avg for URLs Avg for Domains
Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast
0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large
30
Sending multiple prefixes
Example with two prefixes
Decomposition Prefix
petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
Intuitively
bull Prefix for petsymposiumorg is not enough for re-identification
bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix
bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification
31
Ambiguity on two prefixes
bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache
URL Decomposition Prefix
Target URL abcabc A
bc B
Ambiguity
Type I gabcgabc C
abc A
bc B
Type II gbcgbc A larr collision on 32 bits
bc B
Type III defdef A larr collision on 32 bits
ef B larr collision on 32 bits
bull P[Type III] = 1264
bull Type II URLs exist only when decomp on the domain gt 232
bull P[Type I] gt P[Type II] gt P[Type III]
bull Mainly only Type I URLs create ambiguity in re-identification 32
How to track a given URL A real-world example (I)
petsymposiumorg2015
petsymposiumorg
petsymposiumorg2016linksphp
petsymposiumorg2016cfpphp
petsymposiumorg2016faqsphp
Type I URLs for target URL
petsymposiumorg2016
Goal Identify users interested in PETs
bull Target URL is petsymposiumorg2016
33
How to track a given URL A real-world example (II)
bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp
Decomposition Prefix
petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
bull Including 2 prefixes in the local cache rArr Anonymity set size of 4
bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes
bull Server receives 2 prefixes rArr visited page is the target URL
bull Server receives 3 prefixes rArr visited page is either of the leaf URLs
bull The third prefix decides which leaf URL was visited
bull Generalizable to any number of prefixes 34
Examples of URLs creating multiple hits
bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing
URL matching decomposition
Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf
17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp
httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues
1001cartesorgtag
Yandex
httpfrxhamstercomuservideofrxhamstercomxhamstercom
httpnlxhamstercomuservideonlxhamstercomxhamstercom
httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom
httpmmofoscomuserloginmmofoscommofoscom
httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom
teenslovehugecockscom
bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom
bull No need to add additional prefix for French or Dutch version35
Responsible disclosure and impact
bull Disclosure to Mozilla Firefox
ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox
bull Disclosure to Yandex
ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team
bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)
ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy
36
Conclusions amp Future Work
Conclusions
Lesson learnt Collisions are hard to tame in security and privacy
Bloom filters
bull Developers tend to ignore the worst-case of algorithms
bull Data structures with ad-hoc crypto primitives are at the best risky
bull Need of secure instantiations eg as in Count-Min sketches
Safe Browsing
bull Re-establish the weakness of anonymity-set privacy model
bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so
38
Future work
Bloom filters
bull On Bloom filters Bloom paradox [Rottenstreich 2015]
bull Beyond Bloom filters Security of Bloom filter variants
Safe Browsing
bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]
bull Privacy Can local cache improve Private Information Retrieval
39
Thank you
40
Other works not covered today
bull Performance of cryptographic accumulators
bull Private password auditing
bull (In)Security of Google and Yandex Safe Browsing
bull Alerting websites Risks and solutions
bull Decompression quines and anti-viruses
bull Pitfalls of hashing for privacy
bull Linkable (zero-knowledge) proofs for private and accountable gossip
41
Contrasting perspectives and outline
Contrasting perspectives
bull Collisions have to be absolutely avoided in cryptography
bull Somewhat welcome in algorithms and data structures
bull Useful to some extent in the context of privacy
Goal Investigate the security and privacy implications of hash collisions
Focus for today
bull Security Bloom Filters1 The Power of Evil Choices in Bloom Filters DSNrsquo15
Joint work with T Gerbet and C Lauradoux2 Bloom Filters in Adversarial Settings Under submission
Joint work with C Lauradoux and P Lafourcade
bull Privacy Safe Browsing1 A Privacy Analysis of Google and Yandex Safe Browsing DSNrsquo16
Joint work with T Gerbet and C Lauradoux
5
Security Bloom Filters
Bloom filters [Bloom 1970]
Setup(m n k)
bull A binary vector ~z of size m compressing a set of n itemsbull k uniform and independent hash functions hi 0 1lowast rarr [0m minus 1]bull ~z initialized to ~0
Operations
bull Insert(x) Set bits of ~z at h1(x) hk(x) to 1
0 0 0 1 1 0 1 1 0 1
S = x1 x2 x3 k = 2
y1 isin S y2 = x2 y3 isin S(false positive)
bull Query(y) Return True if bits of ~z at h1(y) hk(y) are all 1bull False positive rate and its optimum value have been well studied 7
Our contributions
bull Define adversary models for Bloom filtersbull Query-only adversarybull Chosen-insertion adversarybull Deletion adversary
bull Specific to counting Bloom filters (not covered today)
bull DoS attacks on Bloom enabled software applicationsbull Increase false positive probabilitybull Increase query time
bull Worst-case analysis of Bloom filtersbull false-positive probabilitybull new filter parameters
bull Bloom hash tables as a potential replacement for Bloom filters
8
Query-only adversary
Capabilities Only queries to the filterAssumption State of the filter is known
Goals
bull Craft items that generate false positivesbull Probability to forge a false positive is
(wH (~z)
m
)k
wH(middot) is the Hamming weight
bull Or items whose processing leads to latencybull First k minus 1 bits are set to 1 and the k-th bit set to 0
1 0 1 0 1 1 0 1 1 0 1 0
y k = 3
bull The probability of finding such an item is
(m minus wH(~z)) middot(wH (~z)kminus1
)mk
9
Chosen-insertion adversary
Capabilities Can choose items to insert in the filterAssumption State of the filter is knownGoal Increase the false positive probability
Strategy Greedily insert x that maximizes bits set to 1
bull Each x sets k bits to 1
0 0 0 1 0 1 1 1 1 1 1 1
x1 x2 x3 x4
Impact
No attack Under attackbits set to 1 072nkopt nkopt
false positive rate (f ) 12kopt
(nkoptm
)kopt
10
Impact on a sample filter
Parameters m = 3200 n = 600 kopt = 4 fopt = 0077
0 100 200 300 400 500 600Number of inserted items
False positive probability
0
007
014
021
028
035
f advPartialf
fopt
insert last200 items
11
Applying adversary models
Factors enabling our attacks
bull Insecure hash functions
bull Digest truncation
bull High Bloom filter false positive rate
Vulnerable software applications
Software app Hashing Parameter info
Scrapy Web crawler NA NADablooms Spam filter MurmurHash n = 100000 f = 0057Squid Web proxy MD5 f = 009 k = 4AIEngine NIDS C++ hash ` = 13 n = 5000 f = 045NSRL Forensic tool SHA-1 ` = 32 n asymp 14times 106 f = 808times 10minus10
sdhash Forensic tool SHA-1 ` = 11 n = 128 f = 00014
12
Bypassing a forensic tool
NSRL forensic tool
bull A whitelist of ldquoknown safe filesrdquo
bull Stored and distributed as a Bloom filter
bull Maintained by NIST
A query-only attack Goal is to hide a contraband file
bull Adversary modifies the file to create a false positive
bull Modification should be easily reversible
bull The filter detects the file as safe
13
Countermeasure against chosen-insertion attacks
Use worst-case parameters for Bloom filters
bull Fix m n and choose k that minimizes false positive probability
f adv =
(nk
m
)k
Optimal values are
kadvopt =
m
enand f adv
opt = eminusmen
bull Impact On a sample Bloom filter with m = 3200 n = 600bull Average case kopt = 4 fopt = 0077bull Worst case kadv
opt = 2 f advopt = 01
14
Summary of other attacks amp defenses
Attacks
Software app Attacks
Scrapy Web crawler chosen-insertion query-onlyDablooms Spam filter chosen-insertion deletionSquid Web proxy chosen-insertion query-onlyAIEngine NIDS query-onlysdhash Forensic tool query-only
Defenses
bull Use HMAC
bull Use an alternate data structure Bloom hash tables [Bloom 1970]bull Resists better to chosen-insertion attacksbull Is often more memory efficient than Bloom filtersbull On average O(k) hash computations for items not in the tablebull On average O(ln k) for items in the table
15
Related work
bull Algorithmic complexity attacks [Crosby et al 2003]bull DoS attacks against hash tablesbull Force hash tables to operate in O(n) instead of O(1)bull Similar attacks against skip-lists regular expressions etc
bull Independent work on Bloom filters [Naor et al 2015]bull Provide a theoretical frameworkbull Study a query-only adversary Can only adaptively query the filter
16
Privacy Safe Browsing
Google Safe Browsing in Mozilla Firefox
18
And many others
19
Adverted privacy policy
ldquoWe collect visited web pages clickstream data or web addressaccessed browser identifier and user IDrdquo mdash WOT
ldquocollects information including IP address the origin of thesearch and may share this info with a third partyrdquo mdash Norton
Many Safe Browsing services are privacy unfriendly by design
ldquocannot determine the real URL from the informationreceivedrdquo mdash Google
bull Google seems to provide the most private service
bull Hence focus of this work
20
Google Safe Browsing When Why and How
bull When In 2008 by Google
bull Goals Protect frombull Phishing sitesbull Malware sites
bull How Easy-to-use APIs in C Python and PHP
bull Methodology Blacklists
bull Available in
bull Impactbull Billions of usersbull Detects thousands of new malicious websites per day
bull Cloned by Yandex as Yandex Safe Browsing 21
Lookup API
bull Google harvests phishing and malware URLs to feed a blacklist
bull Client checks the status using a simple HTTP GETPOST requestsb-sslgooglecomsafebrowsingapilookupexamplecom
Issues
bull Does not scale Heavy network traffic
bull Privacy URLs are sent in clear
22
Improving privacy using a local cache
Database
Server ClientLocal cache
(3) Conditionalquery
(4) Answer
(1) Query
(2) Answer
Extended client
bull Communication with the server is reduced
bull Better privacy
23
Google Safe Browsing API (v3) Local cache
bull BlacklistsList name Description Entries
goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1
bull Does not handle URLs directly instead their SHA-256 digests
wwwevilcom SHA-256 cc7af8a31918
cc7af8a31918
32-bit prefix
bull Local cache contains prefixes 24
Clientrsquos behavior chart
Userrsquos input
URLCanonicalize andcompute digest
Foundprefix
Get fulldigests
Founddigest
MaliciousURL
Non-maliciousURL
yes
no
yes
no
25
Canonicalization and decompositions
bull Input URLhttpusrpwdabcport12extparam=1frags
bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password
bull Multiple decompositions are checked for a single URL
Decompositions of canonicalized URL
1 abc12extparam=12 abc12ext3 abc14 abc
5 bc12extparam=16 bc12ext7 bc18 bc
bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious
26
Purpose of computing decompositions
Memory saving
bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix
A more intricate example
Unsafe link Safe link
dbcebc
bcabc2
abc1abc
bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc
27
Privacy of Google Safe Browsing
ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy
Our goal A privacy analysis of Google and Yandex Safe Browsing
URL Prefix
wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3
bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set
28
Tracking Safe Browsing users
Our assumptions
bull Google and Yandex have incentives to behave maliciously
bull Wish to learn whether a user visits some selected URLs
How Google can track Safe Browsing users
bull Builds a list of prefixes to track
bull Includes these prefixes in the clientrsquos local cache
bull Learns from the requests whether a user visited a specific URL
bull Key parameter Anonymity-set size
29
Estimating anonymity-set size
bull Anonymity-set size of a prefix URLs that yield the prefix
Year URLs Domains
2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million
bull Estimate anonymity-set size Apply balls-into-bins model
Avg for URLs Avg for Domains
Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast
0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large
30
Sending multiple prefixes
Example with two prefixes
Decomposition Prefix
petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
Intuitively
bull Prefix for petsymposiumorg is not enough for re-identification
bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix
bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification
31
Ambiguity on two prefixes
bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache
URL Decomposition Prefix
Target URL abcabc A
bc B
Ambiguity
Type I gabcgabc C
abc A
bc B
Type II gbcgbc A larr collision on 32 bits
bc B
Type III defdef A larr collision on 32 bits
ef B larr collision on 32 bits
bull P[Type III] = 1264
bull Type II URLs exist only when decomp on the domain gt 232
bull P[Type I] gt P[Type II] gt P[Type III]
bull Mainly only Type I URLs create ambiguity in re-identification 32
How to track a given URL A real-world example (I)
petsymposiumorg2015
petsymposiumorg
petsymposiumorg2016linksphp
petsymposiumorg2016cfpphp
petsymposiumorg2016faqsphp
Type I URLs for target URL
petsymposiumorg2016
Goal Identify users interested in PETs
bull Target URL is petsymposiumorg2016
33
How to track a given URL A real-world example (II)
bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp
Decomposition Prefix
petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
bull Including 2 prefixes in the local cache rArr Anonymity set size of 4
bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes
bull Server receives 2 prefixes rArr visited page is the target URL
bull Server receives 3 prefixes rArr visited page is either of the leaf URLs
bull The third prefix decides which leaf URL was visited
bull Generalizable to any number of prefixes 34
Examples of URLs creating multiple hits
bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing
URL matching decomposition
Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf
17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp
httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues
1001cartesorgtag
Yandex
httpfrxhamstercomuservideofrxhamstercomxhamstercom
httpnlxhamstercomuservideonlxhamstercomxhamstercom
httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom
httpmmofoscomuserloginmmofoscommofoscom
httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom
teenslovehugecockscom
bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom
bull No need to add additional prefix for French or Dutch version35
Responsible disclosure and impact
bull Disclosure to Mozilla Firefox
ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox
bull Disclosure to Yandex
ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team
bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)
ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy
36
Conclusions amp Future Work
Conclusions
Lesson learnt Collisions are hard to tame in security and privacy
Bloom filters
bull Developers tend to ignore the worst-case of algorithms
bull Data structures with ad-hoc crypto primitives are at the best risky
bull Need of secure instantiations eg as in Count-Min sketches
Safe Browsing
bull Re-establish the weakness of anonymity-set privacy model
bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so
38
Future work
Bloom filters
bull On Bloom filters Bloom paradox [Rottenstreich 2015]
bull Beyond Bloom filters Security of Bloom filter variants
Safe Browsing
bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]
bull Privacy Can local cache improve Private Information Retrieval
39
Thank you
40
Other works not covered today
bull Performance of cryptographic accumulators
bull Private password auditing
bull (In)Security of Google and Yandex Safe Browsing
bull Alerting websites Risks and solutions
bull Decompression quines and anti-viruses
bull Pitfalls of hashing for privacy
bull Linkable (zero-knowledge) proofs for private and accountable gossip
41
Security Bloom Filters
Bloom filters [Bloom 1970]
Setup(m n k)
bull A binary vector ~z of size m compressing a set of n itemsbull k uniform and independent hash functions hi 0 1lowast rarr [0m minus 1]bull ~z initialized to ~0
Operations
bull Insert(x) Set bits of ~z at h1(x) hk(x) to 1
0 0 0 1 1 0 1 1 0 1
S = x1 x2 x3 k = 2
y1 isin S y2 = x2 y3 isin S(false positive)
bull Query(y) Return True if bits of ~z at h1(y) hk(y) are all 1bull False positive rate and its optimum value have been well studied 7
Our contributions
bull Define adversary models for Bloom filtersbull Query-only adversarybull Chosen-insertion adversarybull Deletion adversary
bull Specific to counting Bloom filters (not covered today)
bull DoS attacks on Bloom enabled software applicationsbull Increase false positive probabilitybull Increase query time
bull Worst-case analysis of Bloom filtersbull false-positive probabilitybull new filter parameters
bull Bloom hash tables as a potential replacement for Bloom filters
8
Query-only adversary
Capabilities Only queries to the filterAssumption State of the filter is known
Goals
bull Craft items that generate false positivesbull Probability to forge a false positive is
(wH (~z)
m
)k
wH(middot) is the Hamming weight
bull Or items whose processing leads to latencybull First k minus 1 bits are set to 1 and the k-th bit set to 0
1 0 1 0 1 1 0 1 1 0 1 0
y k = 3
bull The probability of finding such an item is
(m minus wH(~z)) middot(wH (~z)kminus1
)mk
9
Chosen-insertion adversary
Capabilities Can choose items to insert in the filterAssumption State of the filter is knownGoal Increase the false positive probability
Strategy Greedily insert x that maximizes bits set to 1
bull Each x sets k bits to 1
0 0 0 1 0 1 1 1 1 1 1 1
x1 x2 x3 x4
Impact
No attack Under attackbits set to 1 072nkopt nkopt
false positive rate (f ) 12kopt
(nkoptm
)kopt
10
Impact on a sample filter
Parameters m = 3200 n = 600 kopt = 4 fopt = 0077
0 100 200 300 400 500 600Number of inserted items
False positive probability
0
007
014
021
028
035
f advPartialf
fopt
insert last200 items
11
Applying adversary models
Factors enabling our attacks
bull Insecure hash functions
bull Digest truncation
bull High Bloom filter false positive rate
Vulnerable software applications
Software app Hashing Parameter info
Scrapy Web crawler NA NADablooms Spam filter MurmurHash n = 100000 f = 0057Squid Web proxy MD5 f = 009 k = 4AIEngine NIDS C++ hash ` = 13 n = 5000 f = 045NSRL Forensic tool SHA-1 ` = 32 n asymp 14times 106 f = 808times 10minus10
sdhash Forensic tool SHA-1 ` = 11 n = 128 f = 00014
12
Bypassing a forensic tool
NSRL forensic tool
bull A whitelist of ldquoknown safe filesrdquo
bull Stored and distributed as a Bloom filter
bull Maintained by NIST
A query-only attack Goal is to hide a contraband file
bull Adversary modifies the file to create a false positive
bull Modification should be easily reversible
bull The filter detects the file as safe
13
Countermeasure against chosen-insertion attacks
Use worst-case parameters for Bloom filters
bull Fix m n and choose k that minimizes false positive probability
f adv =
(nk
m
)k
Optimal values are
kadvopt =
m
enand f adv
opt = eminusmen
bull Impact On a sample Bloom filter with m = 3200 n = 600bull Average case kopt = 4 fopt = 0077bull Worst case kadv
opt = 2 f advopt = 01
14
Summary of other attacks amp defenses
Attacks
Software app Attacks
Scrapy Web crawler chosen-insertion query-onlyDablooms Spam filter chosen-insertion deletionSquid Web proxy chosen-insertion query-onlyAIEngine NIDS query-onlysdhash Forensic tool query-only
Defenses
bull Use HMAC
bull Use an alternate data structure Bloom hash tables [Bloom 1970]bull Resists better to chosen-insertion attacksbull Is often more memory efficient than Bloom filtersbull On average O(k) hash computations for items not in the tablebull On average O(ln k) for items in the table
15
Related work
bull Algorithmic complexity attacks [Crosby et al 2003]bull DoS attacks against hash tablesbull Force hash tables to operate in O(n) instead of O(1)bull Similar attacks against skip-lists regular expressions etc
bull Independent work on Bloom filters [Naor et al 2015]bull Provide a theoretical frameworkbull Study a query-only adversary Can only adaptively query the filter
16
Privacy Safe Browsing
Google Safe Browsing in Mozilla Firefox
18
And many others
19
Adverted privacy policy
ldquoWe collect visited web pages clickstream data or web addressaccessed browser identifier and user IDrdquo mdash WOT
ldquocollects information including IP address the origin of thesearch and may share this info with a third partyrdquo mdash Norton
Many Safe Browsing services are privacy unfriendly by design
ldquocannot determine the real URL from the informationreceivedrdquo mdash Google
bull Google seems to provide the most private service
bull Hence focus of this work
20
Google Safe Browsing When Why and How
bull When In 2008 by Google
bull Goals Protect frombull Phishing sitesbull Malware sites
bull How Easy-to-use APIs in C Python and PHP
bull Methodology Blacklists
bull Available in
bull Impactbull Billions of usersbull Detects thousands of new malicious websites per day
bull Cloned by Yandex as Yandex Safe Browsing 21
Lookup API
bull Google harvests phishing and malware URLs to feed a blacklist
bull Client checks the status using a simple HTTP GETPOST requestsb-sslgooglecomsafebrowsingapilookupexamplecom
Issues
bull Does not scale Heavy network traffic
bull Privacy URLs are sent in clear
22
Improving privacy using a local cache
Database
Server ClientLocal cache
(3) Conditionalquery
(4) Answer
(1) Query
(2) Answer
Extended client
bull Communication with the server is reduced
bull Better privacy
23
Google Safe Browsing API (v3) Local cache
bull BlacklistsList name Description Entries
goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1
bull Does not handle URLs directly instead their SHA-256 digests
wwwevilcom SHA-256 cc7af8a31918
cc7af8a31918
32-bit prefix
bull Local cache contains prefixes 24
Clientrsquos behavior chart
Userrsquos input
URLCanonicalize andcompute digest
Foundprefix
Get fulldigests
Founddigest
MaliciousURL
Non-maliciousURL
yes
no
yes
no
25
Canonicalization and decompositions
bull Input URLhttpusrpwdabcport12extparam=1frags
bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password
bull Multiple decompositions are checked for a single URL
Decompositions of canonicalized URL
1 abc12extparam=12 abc12ext3 abc14 abc
5 bc12extparam=16 bc12ext7 bc18 bc
bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious
26
Purpose of computing decompositions
Memory saving
bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix
A more intricate example
Unsafe link Safe link
dbcebc
bcabc2
abc1abc
bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc
27
Privacy of Google Safe Browsing
ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy
Our goal A privacy analysis of Google and Yandex Safe Browsing
URL Prefix
wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3
bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set
28
Tracking Safe Browsing users
Our assumptions
bull Google and Yandex have incentives to behave maliciously
bull Wish to learn whether a user visits some selected URLs
How Google can track Safe Browsing users
bull Builds a list of prefixes to track
bull Includes these prefixes in the clientrsquos local cache
bull Learns from the requests whether a user visited a specific URL
bull Key parameter Anonymity-set size
29
Estimating anonymity-set size
bull Anonymity-set size of a prefix URLs that yield the prefix
Year URLs Domains
2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million
bull Estimate anonymity-set size Apply balls-into-bins model
Avg for URLs Avg for Domains
Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast
0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large
30
Sending multiple prefixes
Example with two prefixes
Decomposition Prefix
petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
Intuitively
bull Prefix for petsymposiumorg is not enough for re-identification
bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix
bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification
31
Ambiguity on two prefixes
bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache
URL Decomposition Prefix
Target URL abcabc A
bc B
Ambiguity
Type I gabcgabc C
abc A
bc B
Type II gbcgbc A larr collision on 32 bits
bc B
Type III defdef A larr collision on 32 bits
ef B larr collision on 32 bits
bull P[Type III] = 1264
bull Type II URLs exist only when decomp on the domain gt 232
bull P[Type I] gt P[Type II] gt P[Type III]
bull Mainly only Type I URLs create ambiguity in re-identification 32
How to track a given URL A real-world example (I)
petsymposiumorg2015
petsymposiumorg
petsymposiumorg2016linksphp
petsymposiumorg2016cfpphp
petsymposiumorg2016faqsphp
Type I URLs for target URL
petsymposiumorg2016
Goal Identify users interested in PETs
bull Target URL is petsymposiumorg2016
33
How to track a given URL A real-world example (II)
bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp
Decomposition Prefix
petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
bull Including 2 prefixes in the local cache rArr Anonymity set size of 4
bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes
bull Server receives 2 prefixes rArr visited page is the target URL
bull Server receives 3 prefixes rArr visited page is either of the leaf URLs
bull The third prefix decides which leaf URL was visited
bull Generalizable to any number of prefixes 34
Examples of URLs creating multiple hits
bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing
URL matching decomposition
Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf
17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp
httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues
1001cartesorgtag
Yandex
httpfrxhamstercomuservideofrxhamstercomxhamstercom
httpnlxhamstercomuservideonlxhamstercomxhamstercom
httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom
httpmmofoscomuserloginmmofoscommofoscom
httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom
teenslovehugecockscom
bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom
bull No need to add additional prefix for French or Dutch version35
Responsible disclosure and impact
bull Disclosure to Mozilla Firefox
ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox
bull Disclosure to Yandex
ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team
bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)
ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy
36
Conclusions amp Future Work
Conclusions
Lesson learnt Collisions are hard to tame in security and privacy
Bloom filters
bull Developers tend to ignore the worst-case of algorithms
bull Data structures with ad-hoc crypto primitives are at the best risky
bull Need of secure instantiations eg as in Count-Min sketches
Safe Browsing
bull Re-establish the weakness of anonymity-set privacy model
bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so
38
Future work
Bloom filters
bull On Bloom filters Bloom paradox [Rottenstreich 2015]
bull Beyond Bloom filters Security of Bloom filter variants
Safe Browsing
bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]
bull Privacy Can local cache improve Private Information Retrieval
39
Thank you
40
Other works not covered today
bull Performance of cryptographic accumulators
bull Private password auditing
bull (In)Security of Google and Yandex Safe Browsing
bull Alerting websites Risks and solutions
bull Decompression quines and anti-viruses
bull Pitfalls of hashing for privacy
bull Linkable (zero-knowledge) proofs for private and accountable gossip
41
Bloom filters [Bloom 1970]
Setup(m n k)
bull A binary vector ~z of size m compressing a set of n itemsbull k uniform and independent hash functions hi 0 1lowast rarr [0m minus 1]bull ~z initialized to ~0
Operations
bull Insert(x) Set bits of ~z at h1(x) hk(x) to 1
0 0 0 1 1 0 1 1 0 1
S = x1 x2 x3 k = 2
y1 isin S y2 = x2 y3 isin S(false positive)
bull Query(y) Return True if bits of ~z at h1(y) hk(y) are all 1bull False positive rate and its optimum value have been well studied 7
Our contributions
bull Define adversary models for Bloom filtersbull Query-only adversarybull Chosen-insertion adversarybull Deletion adversary
bull Specific to counting Bloom filters (not covered today)
bull DoS attacks on Bloom enabled software applicationsbull Increase false positive probabilitybull Increase query time
bull Worst-case analysis of Bloom filtersbull false-positive probabilitybull new filter parameters
bull Bloom hash tables as a potential replacement for Bloom filters
8
Query-only adversary
Capabilities Only queries to the filterAssumption State of the filter is known
Goals
bull Craft items that generate false positivesbull Probability to forge a false positive is
(wH (~z)
m
)k
wH(middot) is the Hamming weight
bull Or items whose processing leads to latencybull First k minus 1 bits are set to 1 and the k-th bit set to 0
1 0 1 0 1 1 0 1 1 0 1 0
y k = 3
bull The probability of finding such an item is
(m minus wH(~z)) middot(wH (~z)kminus1
)mk
9
Chosen-insertion adversary
Capabilities Can choose items to insert in the filterAssumption State of the filter is knownGoal Increase the false positive probability
Strategy Greedily insert x that maximizes bits set to 1
bull Each x sets k bits to 1
0 0 0 1 0 1 1 1 1 1 1 1
x1 x2 x3 x4
Impact
No attack Under attackbits set to 1 072nkopt nkopt
false positive rate (f ) 12kopt
(nkoptm
)kopt
10
Impact on a sample filter
Parameters m = 3200 n = 600 kopt = 4 fopt = 0077
0 100 200 300 400 500 600Number of inserted items
False positive probability
0
007
014
021
028
035
f advPartialf
fopt
insert last200 items
11
Applying adversary models
Factors enabling our attacks
bull Insecure hash functions
bull Digest truncation
bull High Bloom filter false positive rate
Vulnerable software applications
Software app Hashing Parameter info
Scrapy Web crawler NA NADablooms Spam filter MurmurHash n = 100000 f = 0057Squid Web proxy MD5 f = 009 k = 4AIEngine NIDS C++ hash ` = 13 n = 5000 f = 045NSRL Forensic tool SHA-1 ` = 32 n asymp 14times 106 f = 808times 10minus10
sdhash Forensic tool SHA-1 ` = 11 n = 128 f = 00014
12
Bypassing a forensic tool
NSRL forensic tool
bull A whitelist of ldquoknown safe filesrdquo
bull Stored and distributed as a Bloom filter
bull Maintained by NIST
A query-only attack Goal is to hide a contraband file
bull Adversary modifies the file to create a false positive
bull Modification should be easily reversible
bull The filter detects the file as safe
13
Countermeasure against chosen-insertion attacks
Use worst-case parameters for Bloom filters
bull Fix m n and choose k that minimizes false positive probability
f adv =
(nk
m
)k
Optimal values are
kadvopt =
m
enand f adv
opt = eminusmen
bull Impact On a sample Bloom filter with m = 3200 n = 600bull Average case kopt = 4 fopt = 0077bull Worst case kadv
opt = 2 f advopt = 01
14
Summary of other attacks amp defenses
Attacks
Software app Attacks
Scrapy Web crawler chosen-insertion query-onlyDablooms Spam filter chosen-insertion deletionSquid Web proxy chosen-insertion query-onlyAIEngine NIDS query-onlysdhash Forensic tool query-only
Defenses
bull Use HMAC
bull Use an alternate data structure Bloom hash tables [Bloom 1970]bull Resists better to chosen-insertion attacksbull Is often more memory efficient than Bloom filtersbull On average O(k) hash computations for items not in the tablebull On average O(ln k) for items in the table
15
Related work
bull Algorithmic complexity attacks [Crosby et al 2003]bull DoS attacks against hash tablesbull Force hash tables to operate in O(n) instead of O(1)bull Similar attacks against skip-lists regular expressions etc
bull Independent work on Bloom filters [Naor et al 2015]bull Provide a theoretical frameworkbull Study a query-only adversary Can only adaptively query the filter
16
Privacy Safe Browsing
Google Safe Browsing in Mozilla Firefox
18
And many others
19
Adverted privacy policy
ldquoWe collect visited web pages clickstream data or web addressaccessed browser identifier and user IDrdquo mdash WOT
ldquocollects information including IP address the origin of thesearch and may share this info with a third partyrdquo mdash Norton
Many Safe Browsing services are privacy unfriendly by design
ldquocannot determine the real URL from the informationreceivedrdquo mdash Google
bull Google seems to provide the most private service
bull Hence focus of this work
20
Google Safe Browsing When Why and How
bull When In 2008 by Google
bull Goals Protect frombull Phishing sitesbull Malware sites
bull How Easy-to-use APIs in C Python and PHP
bull Methodology Blacklists
bull Available in
bull Impactbull Billions of usersbull Detects thousands of new malicious websites per day
bull Cloned by Yandex as Yandex Safe Browsing 21
Lookup API
bull Google harvests phishing and malware URLs to feed a blacklist
bull Client checks the status using a simple HTTP GETPOST requestsb-sslgooglecomsafebrowsingapilookupexamplecom
Issues
bull Does not scale Heavy network traffic
bull Privacy URLs are sent in clear
22
Improving privacy using a local cache
Database
Server ClientLocal cache
(3) Conditionalquery
(4) Answer
(1) Query
(2) Answer
Extended client
bull Communication with the server is reduced
bull Better privacy
23
Google Safe Browsing API (v3) Local cache
bull BlacklistsList name Description Entries
goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1
bull Does not handle URLs directly instead their SHA-256 digests
wwwevilcom SHA-256 cc7af8a31918
cc7af8a31918
32-bit prefix
bull Local cache contains prefixes 24
Clientrsquos behavior chart
Userrsquos input
URLCanonicalize andcompute digest
Foundprefix
Get fulldigests
Founddigest
MaliciousURL
Non-maliciousURL
yes
no
yes
no
25
Canonicalization and decompositions
bull Input URLhttpusrpwdabcport12extparam=1frags
bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password
bull Multiple decompositions are checked for a single URL
Decompositions of canonicalized URL
1 abc12extparam=12 abc12ext3 abc14 abc
5 bc12extparam=16 bc12ext7 bc18 bc
bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious
26
Purpose of computing decompositions
Memory saving
bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix
A more intricate example
Unsafe link Safe link
dbcebc
bcabc2
abc1abc
bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc
27
Privacy of Google Safe Browsing
ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy
Our goal A privacy analysis of Google and Yandex Safe Browsing
URL Prefix
wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3
bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set
28
Tracking Safe Browsing users
Our assumptions
bull Google and Yandex have incentives to behave maliciously
bull Wish to learn whether a user visits some selected URLs
How Google can track Safe Browsing users
bull Builds a list of prefixes to track
bull Includes these prefixes in the clientrsquos local cache
bull Learns from the requests whether a user visited a specific URL
bull Key parameter Anonymity-set size
29
Estimating anonymity-set size
bull Anonymity-set size of a prefix URLs that yield the prefix
Year URLs Domains
2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million
bull Estimate anonymity-set size Apply balls-into-bins model
Avg for URLs Avg for Domains
Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast
0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large
30
Sending multiple prefixes
Example with two prefixes
Decomposition Prefix
petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
Intuitively
bull Prefix for petsymposiumorg is not enough for re-identification
bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix
bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification
31
Ambiguity on two prefixes
bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache
URL Decomposition Prefix
Target URL abcabc A
bc B
Ambiguity
Type I gabcgabc C
abc A
bc B
Type II gbcgbc A larr collision on 32 bits
bc B
Type III defdef A larr collision on 32 bits
ef B larr collision on 32 bits
bull P[Type III] = 1264
bull Type II URLs exist only when decomp on the domain gt 232
bull P[Type I] gt P[Type II] gt P[Type III]
bull Mainly only Type I URLs create ambiguity in re-identification 32
How to track a given URL A real-world example (I)
petsymposiumorg2015
petsymposiumorg
petsymposiumorg2016linksphp
petsymposiumorg2016cfpphp
petsymposiumorg2016faqsphp
Type I URLs for target URL
petsymposiumorg2016
Goal Identify users interested in PETs
bull Target URL is petsymposiumorg2016
33
How to track a given URL A real-world example (II)
bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp
Decomposition Prefix
petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
bull Including 2 prefixes in the local cache rArr Anonymity set size of 4
bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes
bull Server receives 2 prefixes rArr visited page is the target URL
bull Server receives 3 prefixes rArr visited page is either of the leaf URLs
bull The third prefix decides which leaf URL was visited
bull Generalizable to any number of prefixes 34
Examples of URLs creating multiple hits
bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing
URL matching decomposition
Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf
17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp
httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues
1001cartesorgtag
Yandex
httpfrxhamstercomuservideofrxhamstercomxhamstercom
httpnlxhamstercomuservideonlxhamstercomxhamstercom
httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom
httpmmofoscomuserloginmmofoscommofoscom
httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom
teenslovehugecockscom
bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom
bull No need to add additional prefix for French or Dutch version35
Responsible disclosure and impact
bull Disclosure to Mozilla Firefox
ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox
bull Disclosure to Yandex
ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team
bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)
ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy
36
Conclusions amp Future Work
Conclusions
Lesson learnt Collisions are hard to tame in security and privacy
Bloom filters
bull Developers tend to ignore the worst-case of algorithms
bull Data structures with ad-hoc crypto primitives are at the best risky
bull Need of secure instantiations eg as in Count-Min sketches
Safe Browsing
bull Re-establish the weakness of anonymity-set privacy model
bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so
38
Future work
Bloom filters
bull On Bloom filters Bloom paradox [Rottenstreich 2015]
bull Beyond Bloom filters Security of Bloom filter variants
Safe Browsing
bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]
bull Privacy Can local cache improve Private Information Retrieval
39
Thank you
40
Other works not covered today
bull Performance of cryptographic accumulators
bull Private password auditing
bull (In)Security of Google and Yandex Safe Browsing
bull Alerting websites Risks and solutions
bull Decompression quines and anti-viruses
bull Pitfalls of hashing for privacy
bull Linkable (zero-knowledge) proofs for private and accountable gossip
41
Our contributions
bull Define adversary models for Bloom filtersbull Query-only adversarybull Chosen-insertion adversarybull Deletion adversary
bull Specific to counting Bloom filters (not covered today)
bull DoS attacks on Bloom enabled software applicationsbull Increase false positive probabilitybull Increase query time
bull Worst-case analysis of Bloom filtersbull false-positive probabilitybull new filter parameters
bull Bloom hash tables as a potential replacement for Bloom filters
8
Query-only adversary
Capabilities Only queries to the filterAssumption State of the filter is known
Goals
bull Craft items that generate false positivesbull Probability to forge a false positive is
(wH (~z)
m
)k
wH(middot) is the Hamming weight
bull Or items whose processing leads to latencybull First k minus 1 bits are set to 1 and the k-th bit set to 0
1 0 1 0 1 1 0 1 1 0 1 0
y k = 3
bull The probability of finding such an item is
(m minus wH(~z)) middot(wH (~z)kminus1
)mk
9
Chosen-insertion adversary
Capabilities Can choose items to insert in the filterAssumption State of the filter is knownGoal Increase the false positive probability
Strategy Greedily insert x that maximizes bits set to 1
bull Each x sets k bits to 1
0 0 0 1 0 1 1 1 1 1 1 1
x1 x2 x3 x4
Impact
No attack Under attackbits set to 1 072nkopt nkopt
false positive rate (f ) 12kopt
(nkoptm
)kopt
10
Impact on a sample filter
Parameters m = 3200 n = 600 kopt = 4 fopt = 0077
0 100 200 300 400 500 600Number of inserted items
False positive probability
0
007
014
021
028
035
f advPartialf
fopt
insert last200 items
11
Applying adversary models
Factors enabling our attacks
bull Insecure hash functions
bull Digest truncation
bull High Bloom filter false positive rate
Vulnerable software applications
Software app Hashing Parameter info
Scrapy Web crawler NA NADablooms Spam filter MurmurHash n = 100000 f = 0057Squid Web proxy MD5 f = 009 k = 4AIEngine NIDS C++ hash ` = 13 n = 5000 f = 045NSRL Forensic tool SHA-1 ` = 32 n asymp 14times 106 f = 808times 10minus10
sdhash Forensic tool SHA-1 ` = 11 n = 128 f = 00014
12
Bypassing a forensic tool
NSRL forensic tool
bull A whitelist of ldquoknown safe filesrdquo
bull Stored and distributed as a Bloom filter
bull Maintained by NIST
A query-only attack Goal is to hide a contraband file
bull Adversary modifies the file to create a false positive
bull Modification should be easily reversible
bull The filter detects the file as safe
13
Countermeasure against chosen-insertion attacks
Use worst-case parameters for Bloom filters
bull Fix m n and choose k that minimizes false positive probability
f adv =
(nk
m
)k
Optimal values are
kadvopt =
m
enand f adv
opt = eminusmen
bull Impact On a sample Bloom filter with m = 3200 n = 600bull Average case kopt = 4 fopt = 0077bull Worst case kadv
opt = 2 f advopt = 01
14
Summary of other attacks amp defenses
Attacks
Software app Attacks
Scrapy Web crawler chosen-insertion query-onlyDablooms Spam filter chosen-insertion deletionSquid Web proxy chosen-insertion query-onlyAIEngine NIDS query-onlysdhash Forensic tool query-only
Defenses
bull Use HMAC
bull Use an alternate data structure Bloom hash tables [Bloom 1970]bull Resists better to chosen-insertion attacksbull Is often more memory efficient than Bloom filtersbull On average O(k) hash computations for items not in the tablebull On average O(ln k) for items in the table
15
Related work
bull Algorithmic complexity attacks [Crosby et al 2003]bull DoS attacks against hash tablesbull Force hash tables to operate in O(n) instead of O(1)bull Similar attacks against skip-lists regular expressions etc
bull Independent work on Bloom filters [Naor et al 2015]bull Provide a theoretical frameworkbull Study a query-only adversary Can only adaptively query the filter
16
Privacy Safe Browsing
Google Safe Browsing in Mozilla Firefox
18
And many others
19
Adverted privacy policy
ldquoWe collect visited web pages clickstream data or web addressaccessed browser identifier and user IDrdquo mdash WOT
ldquocollects information including IP address the origin of thesearch and may share this info with a third partyrdquo mdash Norton
Many Safe Browsing services are privacy unfriendly by design
ldquocannot determine the real URL from the informationreceivedrdquo mdash Google
bull Google seems to provide the most private service
bull Hence focus of this work
20
Google Safe Browsing When Why and How
bull When In 2008 by Google
bull Goals Protect frombull Phishing sitesbull Malware sites
bull How Easy-to-use APIs in C Python and PHP
bull Methodology Blacklists
bull Available in
bull Impactbull Billions of usersbull Detects thousands of new malicious websites per day
bull Cloned by Yandex as Yandex Safe Browsing 21
Lookup API
bull Google harvests phishing and malware URLs to feed a blacklist
bull Client checks the status using a simple HTTP GETPOST requestsb-sslgooglecomsafebrowsingapilookupexamplecom
Issues
bull Does not scale Heavy network traffic
bull Privacy URLs are sent in clear
22
Improving privacy using a local cache
Database
Server ClientLocal cache
(3) Conditionalquery
(4) Answer
(1) Query
(2) Answer
Extended client
bull Communication with the server is reduced
bull Better privacy
23
Google Safe Browsing API (v3) Local cache
bull BlacklistsList name Description Entries
goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1
bull Does not handle URLs directly instead their SHA-256 digests
wwwevilcom SHA-256 cc7af8a31918
cc7af8a31918
32-bit prefix
bull Local cache contains prefixes 24
Clientrsquos behavior chart
Userrsquos input
URLCanonicalize andcompute digest
Foundprefix
Get fulldigests
Founddigest
MaliciousURL
Non-maliciousURL
yes
no
yes
no
25
Canonicalization and decompositions
bull Input URLhttpusrpwdabcport12extparam=1frags
bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password
bull Multiple decompositions are checked for a single URL
Decompositions of canonicalized URL
1 abc12extparam=12 abc12ext3 abc14 abc
5 bc12extparam=16 bc12ext7 bc18 bc
bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious
26
Purpose of computing decompositions
Memory saving
bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix
A more intricate example
Unsafe link Safe link
dbcebc
bcabc2
abc1abc
bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc
27
Privacy of Google Safe Browsing
ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy
Our goal A privacy analysis of Google and Yandex Safe Browsing
URL Prefix
wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3
bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set
28
Tracking Safe Browsing users
Our assumptions
bull Google and Yandex have incentives to behave maliciously
bull Wish to learn whether a user visits some selected URLs
How Google can track Safe Browsing users
bull Builds a list of prefixes to track
bull Includes these prefixes in the clientrsquos local cache
bull Learns from the requests whether a user visited a specific URL
bull Key parameter Anonymity-set size
29
Estimating anonymity-set size
bull Anonymity-set size of a prefix URLs that yield the prefix
Year URLs Domains
2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million
bull Estimate anonymity-set size Apply balls-into-bins model
Avg for URLs Avg for Domains
Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast
0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large
30
Sending multiple prefixes
Example with two prefixes
Decomposition Prefix
petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
Intuitively
bull Prefix for petsymposiumorg is not enough for re-identification
bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix
bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification
31
Ambiguity on two prefixes
bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache
URL Decomposition Prefix
Target URL abcabc A
bc B
Ambiguity
Type I gabcgabc C
abc A
bc B
Type II gbcgbc A larr collision on 32 bits
bc B
Type III defdef A larr collision on 32 bits
ef B larr collision on 32 bits
bull P[Type III] = 1264
bull Type II URLs exist only when decomp on the domain gt 232
bull P[Type I] gt P[Type II] gt P[Type III]
bull Mainly only Type I URLs create ambiguity in re-identification 32
How to track a given URL A real-world example (I)
petsymposiumorg2015
petsymposiumorg
petsymposiumorg2016linksphp
petsymposiumorg2016cfpphp
petsymposiumorg2016faqsphp
Type I URLs for target URL
petsymposiumorg2016
Goal Identify users interested in PETs
bull Target URL is petsymposiumorg2016
33
How to track a given URL A real-world example (II)
bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp
Decomposition Prefix
petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
bull Including 2 prefixes in the local cache rArr Anonymity set size of 4
bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes
bull Server receives 2 prefixes rArr visited page is the target URL
bull Server receives 3 prefixes rArr visited page is either of the leaf URLs
bull The third prefix decides which leaf URL was visited
bull Generalizable to any number of prefixes 34
Examples of URLs creating multiple hits
bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing
URL matching decomposition
Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf
17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp
httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues
1001cartesorgtag
Yandex
httpfrxhamstercomuservideofrxhamstercomxhamstercom
httpnlxhamstercomuservideonlxhamstercomxhamstercom
httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom
httpmmofoscomuserloginmmofoscommofoscom
httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom
teenslovehugecockscom
bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom
bull No need to add additional prefix for French or Dutch version35
Responsible disclosure and impact
bull Disclosure to Mozilla Firefox
ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox
bull Disclosure to Yandex
ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team
bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)
ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy
36
Conclusions amp Future Work
Conclusions
Lesson learnt Collisions are hard to tame in security and privacy
Bloom filters
bull Developers tend to ignore the worst-case of algorithms
bull Data structures with ad-hoc crypto primitives are at the best risky
bull Need of secure instantiations eg as in Count-Min sketches
Safe Browsing
bull Re-establish the weakness of anonymity-set privacy model
bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so
38
Future work
Bloom filters
bull On Bloom filters Bloom paradox [Rottenstreich 2015]
bull Beyond Bloom filters Security of Bloom filter variants
Safe Browsing
bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]
bull Privacy Can local cache improve Private Information Retrieval
39
Thank you
40
Other works not covered today
bull Performance of cryptographic accumulators
bull Private password auditing
bull (In)Security of Google and Yandex Safe Browsing
bull Alerting websites Risks and solutions
bull Decompression quines and anti-viruses
bull Pitfalls of hashing for privacy
bull Linkable (zero-knowledge) proofs for private and accountable gossip
41
Query-only adversary
Capabilities Only queries to the filterAssumption State of the filter is known
Goals
bull Craft items that generate false positivesbull Probability to forge a false positive is
(wH (~z)
m
)k
wH(middot) is the Hamming weight
bull Or items whose processing leads to latencybull First k minus 1 bits are set to 1 and the k-th bit set to 0
1 0 1 0 1 1 0 1 1 0 1 0
y k = 3
bull The probability of finding such an item is
(m minus wH(~z)) middot(wH (~z)kminus1
)mk
9
Chosen-insertion adversary
Capabilities Can choose items to insert in the filterAssumption State of the filter is knownGoal Increase the false positive probability
Strategy Greedily insert x that maximizes bits set to 1
bull Each x sets k bits to 1
0 0 0 1 0 1 1 1 1 1 1 1
x1 x2 x3 x4
Impact
No attack Under attackbits set to 1 072nkopt nkopt
false positive rate (f ) 12kopt
(nkoptm
)kopt
10
Impact on a sample filter
Parameters m = 3200 n = 600 kopt = 4 fopt = 0077
0 100 200 300 400 500 600Number of inserted items
False positive probability
0
007
014
021
028
035
f advPartialf
fopt
insert last200 items
11
Applying adversary models
Factors enabling our attacks
bull Insecure hash functions
bull Digest truncation
bull High Bloom filter false positive rate
Vulnerable software applications
Software app Hashing Parameter info
Scrapy Web crawler NA NADablooms Spam filter MurmurHash n = 100000 f = 0057Squid Web proxy MD5 f = 009 k = 4AIEngine NIDS C++ hash ` = 13 n = 5000 f = 045NSRL Forensic tool SHA-1 ` = 32 n asymp 14times 106 f = 808times 10minus10
sdhash Forensic tool SHA-1 ` = 11 n = 128 f = 00014
12
Bypassing a forensic tool
NSRL forensic tool
bull A whitelist of ldquoknown safe filesrdquo
bull Stored and distributed as a Bloom filter
bull Maintained by NIST
A query-only attack Goal is to hide a contraband file
bull Adversary modifies the file to create a false positive
bull Modification should be easily reversible
bull The filter detects the file as safe
13
Countermeasure against chosen-insertion attacks
Use worst-case parameters for Bloom filters
bull Fix m n and choose k that minimizes false positive probability
f adv =
(nk
m
)k
Optimal values are
kadvopt =
m
enand f adv
opt = eminusmen
bull Impact On a sample Bloom filter with m = 3200 n = 600bull Average case kopt = 4 fopt = 0077bull Worst case kadv
opt = 2 f advopt = 01
14
Summary of other attacks amp defenses
Attacks
Software app Attacks
Scrapy Web crawler chosen-insertion query-onlyDablooms Spam filter chosen-insertion deletionSquid Web proxy chosen-insertion query-onlyAIEngine NIDS query-onlysdhash Forensic tool query-only
Defenses
bull Use HMAC
bull Use an alternate data structure Bloom hash tables [Bloom 1970]bull Resists better to chosen-insertion attacksbull Is often more memory efficient than Bloom filtersbull On average O(k) hash computations for items not in the tablebull On average O(ln k) for items in the table
15
Related work
bull Algorithmic complexity attacks [Crosby et al 2003]bull DoS attacks against hash tablesbull Force hash tables to operate in O(n) instead of O(1)bull Similar attacks against skip-lists regular expressions etc
bull Independent work on Bloom filters [Naor et al 2015]bull Provide a theoretical frameworkbull Study a query-only adversary Can only adaptively query the filter
16
Privacy Safe Browsing
Google Safe Browsing in Mozilla Firefox
18
And many others
19
Adverted privacy policy
ldquoWe collect visited web pages clickstream data or web addressaccessed browser identifier and user IDrdquo mdash WOT
ldquocollects information including IP address the origin of thesearch and may share this info with a third partyrdquo mdash Norton
Many Safe Browsing services are privacy unfriendly by design
ldquocannot determine the real URL from the informationreceivedrdquo mdash Google
bull Google seems to provide the most private service
bull Hence focus of this work
20
Google Safe Browsing When Why and How
bull When In 2008 by Google
bull Goals Protect frombull Phishing sitesbull Malware sites
bull How Easy-to-use APIs in C Python and PHP
bull Methodology Blacklists
bull Available in
bull Impactbull Billions of usersbull Detects thousands of new malicious websites per day
bull Cloned by Yandex as Yandex Safe Browsing 21
Lookup API
bull Google harvests phishing and malware URLs to feed a blacklist
bull Client checks the status using a simple HTTP GETPOST requestsb-sslgooglecomsafebrowsingapilookupexamplecom
Issues
bull Does not scale Heavy network traffic
bull Privacy URLs are sent in clear
22
Improving privacy using a local cache
Database
Server ClientLocal cache
(3) Conditionalquery
(4) Answer
(1) Query
(2) Answer
Extended client
bull Communication with the server is reduced
bull Better privacy
23
Google Safe Browsing API (v3) Local cache
bull BlacklistsList name Description Entries
goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1
bull Does not handle URLs directly instead their SHA-256 digests
wwwevilcom SHA-256 cc7af8a31918
cc7af8a31918
32-bit prefix
bull Local cache contains prefixes 24
Clientrsquos behavior chart
Userrsquos input
URLCanonicalize andcompute digest
Foundprefix
Get fulldigests
Founddigest
MaliciousURL
Non-maliciousURL
yes
no
yes
no
25
Canonicalization and decompositions
bull Input URLhttpusrpwdabcport12extparam=1frags
bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password
bull Multiple decompositions are checked for a single URL
Decompositions of canonicalized URL
1 abc12extparam=12 abc12ext3 abc14 abc
5 bc12extparam=16 bc12ext7 bc18 bc
bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious
26
Purpose of computing decompositions
Memory saving
bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix
A more intricate example
Unsafe link Safe link
dbcebc
bcabc2
abc1abc
bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc
27
Privacy of Google Safe Browsing
ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy
Our goal A privacy analysis of Google and Yandex Safe Browsing
URL Prefix
wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3
bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set
28
Tracking Safe Browsing users
Our assumptions
bull Google and Yandex have incentives to behave maliciously
bull Wish to learn whether a user visits some selected URLs
How Google can track Safe Browsing users
bull Builds a list of prefixes to track
bull Includes these prefixes in the clientrsquos local cache
bull Learns from the requests whether a user visited a specific URL
bull Key parameter Anonymity-set size
29
Estimating anonymity-set size
bull Anonymity-set size of a prefix URLs that yield the prefix
Year URLs Domains
2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million
bull Estimate anonymity-set size Apply balls-into-bins model
Avg for URLs Avg for Domains
Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast
0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large
30
Sending multiple prefixes
Example with two prefixes
Decomposition Prefix
petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
Intuitively
bull Prefix for petsymposiumorg is not enough for re-identification
bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix
bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification
31
Ambiguity on two prefixes
bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache
URL Decomposition Prefix
Target URL abcabc A
bc B
Ambiguity
Type I gabcgabc C
abc A
bc B
Type II gbcgbc A larr collision on 32 bits
bc B
Type III defdef A larr collision on 32 bits
ef B larr collision on 32 bits
bull P[Type III] = 1264
bull Type II URLs exist only when decomp on the domain gt 232
bull P[Type I] gt P[Type II] gt P[Type III]
bull Mainly only Type I URLs create ambiguity in re-identification 32
How to track a given URL A real-world example (I)
petsymposiumorg2015
petsymposiumorg
petsymposiumorg2016linksphp
petsymposiumorg2016cfpphp
petsymposiumorg2016faqsphp
Type I URLs for target URL
petsymposiumorg2016
Goal Identify users interested in PETs
bull Target URL is petsymposiumorg2016
33
How to track a given URL A real-world example (II)
bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp
Decomposition Prefix
petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
bull Including 2 prefixes in the local cache rArr Anonymity set size of 4
bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes
bull Server receives 2 prefixes rArr visited page is the target URL
bull Server receives 3 prefixes rArr visited page is either of the leaf URLs
bull The third prefix decides which leaf URL was visited
bull Generalizable to any number of prefixes 34
Examples of URLs creating multiple hits
bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing
URL matching decomposition
Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf
17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp
httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues
1001cartesorgtag
Yandex
httpfrxhamstercomuservideofrxhamstercomxhamstercom
httpnlxhamstercomuservideonlxhamstercomxhamstercom
httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom
httpmmofoscomuserloginmmofoscommofoscom
httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom
teenslovehugecockscom
bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom
bull No need to add additional prefix for French or Dutch version35
Responsible disclosure and impact
bull Disclosure to Mozilla Firefox
ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox
bull Disclosure to Yandex
ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team
bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)
ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy
36
Conclusions amp Future Work
Conclusions
Lesson learnt Collisions are hard to tame in security and privacy
Bloom filters
bull Developers tend to ignore the worst-case of algorithms
bull Data structures with ad-hoc crypto primitives are at the best risky
bull Need of secure instantiations eg as in Count-Min sketches
Safe Browsing
bull Re-establish the weakness of anonymity-set privacy model
bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so
38
Future work
Bloom filters
bull On Bloom filters Bloom paradox [Rottenstreich 2015]
bull Beyond Bloom filters Security of Bloom filter variants
Safe Browsing
bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]
bull Privacy Can local cache improve Private Information Retrieval
39
Thank you
40
Other works not covered today
bull Performance of cryptographic accumulators
bull Private password auditing
bull (In)Security of Google and Yandex Safe Browsing
bull Alerting websites Risks and solutions
bull Decompression quines and anti-viruses
bull Pitfalls of hashing for privacy
bull Linkable (zero-knowledge) proofs for private and accountable gossip
41
Chosen-insertion adversary
Capabilities Can choose items to insert in the filterAssumption State of the filter is knownGoal Increase the false positive probability
Strategy Greedily insert x that maximizes bits set to 1
bull Each x sets k bits to 1
0 0 0 1 0 1 1 1 1 1 1 1
x1 x2 x3 x4
Impact
No attack Under attackbits set to 1 072nkopt nkopt
false positive rate (f ) 12kopt
(nkoptm
)kopt
10
Impact on a sample filter
Parameters m = 3200 n = 600 kopt = 4 fopt = 0077
0 100 200 300 400 500 600Number of inserted items
False positive probability
0
007
014
021
028
035
f advPartialf
fopt
insert last200 items
11
Applying adversary models
Factors enabling our attacks
bull Insecure hash functions
bull Digest truncation
bull High Bloom filter false positive rate
Vulnerable software applications
Software app Hashing Parameter info
Scrapy Web crawler NA NADablooms Spam filter MurmurHash n = 100000 f = 0057Squid Web proxy MD5 f = 009 k = 4AIEngine NIDS C++ hash ` = 13 n = 5000 f = 045NSRL Forensic tool SHA-1 ` = 32 n asymp 14times 106 f = 808times 10minus10
sdhash Forensic tool SHA-1 ` = 11 n = 128 f = 00014
12
Bypassing a forensic tool
NSRL forensic tool
bull A whitelist of ldquoknown safe filesrdquo
bull Stored and distributed as a Bloom filter
bull Maintained by NIST
A query-only attack Goal is to hide a contraband file
bull Adversary modifies the file to create a false positive
bull Modification should be easily reversible
bull The filter detects the file as safe
13
Countermeasure against chosen-insertion attacks
Use worst-case parameters for Bloom filters
bull Fix m n and choose k that minimizes false positive probability
f adv =
(nk
m
)k
Optimal values are
kadvopt =
m
enand f adv
opt = eminusmen
bull Impact On a sample Bloom filter with m = 3200 n = 600bull Average case kopt = 4 fopt = 0077bull Worst case kadv
opt = 2 f advopt = 01
14
Summary of other attacks amp defenses
Attacks
Software app Attacks
Scrapy Web crawler chosen-insertion query-onlyDablooms Spam filter chosen-insertion deletionSquid Web proxy chosen-insertion query-onlyAIEngine NIDS query-onlysdhash Forensic tool query-only
Defenses
bull Use HMAC
bull Use an alternate data structure Bloom hash tables [Bloom 1970]bull Resists better to chosen-insertion attacksbull Is often more memory efficient than Bloom filtersbull On average O(k) hash computations for items not in the tablebull On average O(ln k) for items in the table
15
Related work
bull Algorithmic complexity attacks [Crosby et al 2003]bull DoS attacks against hash tablesbull Force hash tables to operate in O(n) instead of O(1)bull Similar attacks against skip-lists regular expressions etc
bull Independent work on Bloom filters [Naor et al 2015]bull Provide a theoretical frameworkbull Study a query-only adversary Can only adaptively query the filter
16
Privacy Safe Browsing
Google Safe Browsing in Mozilla Firefox
18
And many others
19
Adverted privacy policy
ldquoWe collect visited web pages clickstream data or web addressaccessed browser identifier and user IDrdquo mdash WOT
ldquocollects information including IP address the origin of thesearch and may share this info with a third partyrdquo mdash Norton
Many Safe Browsing services are privacy unfriendly by design
ldquocannot determine the real URL from the informationreceivedrdquo mdash Google
bull Google seems to provide the most private service
bull Hence focus of this work
20
Google Safe Browsing When Why and How
bull When In 2008 by Google
bull Goals Protect frombull Phishing sitesbull Malware sites
bull How Easy-to-use APIs in C Python and PHP
bull Methodology Blacklists
bull Available in
bull Impactbull Billions of usersbull Detects thousands of new malicious websites per day
bull Cloned by Yandex as Yandex Safe Browsing 21
Lookup API
bull Google harvests phishing and malware URLs to feed a blacklist
bull Client checks the status using a simple HTTP GETPOST requestsb-sslgooglecomsafebrowsingapilookupexamplecom
Issues
bull Does not scale Heavy network traffic
bull Privacy URLs are sent in clear
22
Improving privacy using a local cache
Database
Server ClientLocal cache
(3) Conditionalquery
(4) Answer
(1) Query
(2) Answer
Extended client
bull Communication with the server is reduced
bull Better privacy
23
Google Safe Browsing API (v3) Local cache
bull BlacklistsList name Description Entries
goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1
bull Does not handle URLs directly instead their SHA-256 digests
wwwevilcom SHA-256 cc7af8a31918
cc7af8a31918
32-bit prefix
bull Local cache contains prefixes 24
Clientrsquos behavior chart
Userrsquos input
URLCanonicalize andcompute digest
Foundprefix
Get fulldigests
Founddigest
MaliciousURL
Non-maliciousURL
yes
no
yes
no
25
Canonicalization and decompositions
bull Input URLhttpusrpwdabcport12extparam=1frags
bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password
bull Multiple decompositions are checked for a single URL
Decompositions of canonicalized URL
1 abc12extparam=12 abc12ext3 abc14 abc
5 bc12extparam=16 bc12ext7 bc18 bc
bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious
26
Purpose of computing decompositions
Memory saving
bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix
A more intricate example
Unsafe link Safe link
dbcebc
bcabc2
abc1abc
bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc
27
Privacy of Google Safe Browsing
ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy
Our goal A privacy analysis of Google and Yandex Safe Browsing
URL Prefix
wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3
bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set
28
Tracking Safe Browsing users
Our assumptions
bull Google and Yandex have incentives to behave maliciously
bull Wish to learn whether a user visits some selected URLs
How Google can track Safe Browsing users
bull Builds a list of prefixes to track
bull Includes these prefixes in the clientrsquos local cache
bull Learns from the requests whether a user visited a specific URL
bull Key parameter Anonymity-set size
29
Estimating anonymity-set size
bull Anonymity-set size of a prefix URLs that yield the prefix
Year URLs Domains
2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million
bull Estimate anonymity-set size Apply balls-into-bins model
Avg for URLs Avg for Domains
Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast
0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large
30
Sending multiple prefixes
Example with two prefixes
Decomposition Prefix
petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
Intuitively
bull Prefix for petsymposiumorg is not enough for re-identification
bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix
bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification
31
Ambiguity on two prefixes
bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache
URL Decomposition Prefix
Target URL abcabc A
bc B
Ambiguity
Type I gabcgabc C
abc A
bc B
Type II gbcgbc A larr collision on 32 bits
bc B
Type III defdef A larr collision on 32 bits
ef B larr collision on 32 bits
bull P[Type III] = 1264
bull Type II URLs exist only when decomp on the domain gt 232
bull P[Type I] gt P[Type II] gt P[Type III]
bull Mainly only Type I URLs create ambiguity in re-identification 32
How to track a given URL A real-world example (I)
petsymposiumorg2015
petsymposiumorg
petsymposiumorg2016linksphp
petsymposiumorg2016cfpphp
petsymposiumorg2016faqsphp
Type I URLs for target URL
petsymposiumorg2016
Goal Identify users interested in PETs
bull Target URL is petsymposiumorg2016
33
How to track a given URL A real-world example (II)
bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp
Decomposition Prefix
petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
bull Including 2 prefixes in the local cache rArr Anonymity set size of 4
bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes
bull Server receives 2 prefixes rArr visited page is the target URL
bull Server receives 3 prefixes rArr visited page is either of the leaf URLs
bull The third prefix decides which leaf URL was visited
bull Generalizable to any number of prefixes 34
Examples of URLs creating multiple hits
bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing
URL matching decomposition
Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf
17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp
httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues
1001cartesorgtag
Yandex
httpfrxhamstercomuservideofrxhamstercomxhamstercom
httpnlxhamstercomuservideonlxhamstercomxhamstercom
httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom
httpmmofoscomuserloginmmofoscommofoscom
httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom
teenslovehugecockscom
bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom
bull No need to add additional prefix for French or Dutch version35
Responsible disclosure and impact
bull Disclosure to Mozilla Firefox
ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox
bull Disclosure to Yandex
ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team
bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)
ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy
36
Conclusions amp Future Work
Conclusions
Lesson learnt Collisions are hard to tame in security and privacy
Bloom filters
bull Developers tend to ignore the worst-case of algorithms
bull Data structures with ad-hoc crypto primitives are at the best risky
bull Need of secure instantiations eg as in Count-Min sketches
Safe Browsing
bull Re-establish the weakness of anonymity-set privacy model
bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so
38
Future work
Bloom filters
bull On Bloom filters Bloom paradox [Rottenstreich 2015]
bull Beyond Bloom filters Security of Bloom filter variants
Safe Browsing
bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]
bull Privacy Can local cache improve Private Information Retrieval
39
Thank you
40
Other works not covered today
bull Performance of cryptographic accumulators
bull Private password auditing
bull (In)Security of Google and Yandex Safe Browsing
bull Alerting websites Risks and solutions
bull Decompression quines and anti-viruses
bull Pitfalls of hashing for privacy
bull Linkable (zero-knowledge) proofs for private and accountable gossip
41
Impact on a sample filter
Parameters m = 3200 n = 600 kopt = 4 fopt = 0077
0 100 200 300 400 500 600Number of inserted items
False positive probability
0
007
014
021
028
035
f advPartialf
fopt
insert last200 items
11
Applying adversary models
Factors enabling our attacks
bull Insecure hash functions
bull Digest truncation
bull High Bloom filter false positive rate
Vulnerable software applications
Software app Hashing Parameter info
Scrapy Web crawler NA NADablooms Spam filter MurmurHash n = 100000 f = 0057Squid Web proxy MD5 f = 009 k = 4AIEngine NIDS C++ hash ` = 13 n = 5000 f = 045NSRL Forensic tool SHA-1 ` = 32 n asymp 14times 106 f = 808times 10minus10
sdhash Forensic tool SHA-1 ` = 11 n = 128 f = 00014
12
Bypassing a forensic tool
NSRL forensic tool
bull A whitelist of ldquoknown safe filesrdquo
bull Stored and distributed as a Bloom filter
bull Maintained by NIST
A query-only attack Goal is to hide a contraband file
bull Adversary modifies the file to create a false positive
bull Modification should be easily reversible
bull The filter detects the file as safe
13
Countermeasure against chosen-insertion attacks
Use worst-case parameters for Bloom filters
bull Fix m n and choose k that minimizes false positive probability
f adv =
(nk
m
)k
Optimal values are
kadvopt =
m
enand f adv
opt = eminusmen
bull Impact On a sample Bloom filter with m = 3200 n = 600bull Average case kopt = 4 fopt = 0077bull Worst case kadv
opt = 2 f advopt = 01
14
Summary of other attacks amp defenses
Attacks
Software app Attacks
Scrapy Web crawler chosen-insertion query-onlyDablooms Spam filter chosen-insertion deletionSquid Web proxy chosen-insertion query-onlyAIEngine NIDS query-onlysdhash Forensic tool query-only
Defenses
bull Use HMAC
bull Use an alternate data structure Bloom hash tables [Bloom 1970]bull Resists better to chosen-insertion attacksbull Is often more memory efficient than Bloom filtersbull On average O(k) hash computations for items not in the tablebull On average O(ln k) for items in the table
15
Related work
bull Algorithmic complexity attacks [Crosby et al 2003]bull DoS attacks against hash tablesbull Force hash tables to operate in O(n) instead of O(1)bull Similar attacks against skip-lists regular expressions etc
bull Independent work on Bloom filters [Naor et al 2015]bull Provide a theoretical frameworkbull Study a query-only adversary Can only adaptively query the filter
16
Privacy Safe Browsing
Google Safe Browsing in Mozilla Firefox
18
And many others
19
Adverted privacy policy
ldquoWe collect visited web pages clickstream data or web addressaccessed browser identifier and user IDrdquo mdash WOT
ldquocollects information including IP address the origin of thesearch and may share this info with a third partyrdquo mdash Norton
Many Safe Browsing services are privacy unfriendly by design
ldquocannot determine the real URL from the informationreceivedrdquo mdash Google
bull Google seems to provide the most private service
bull Hence focus of this work
20
Google Safe Browsing When Why and How
bull When In 2008 by Google
bull Goals Protect frombull Phishing sitesbull Malware sites
bull How Easy-to-use APIs in C Python and PHP
bull Methodology Blacklists
bull Available in
bull Impactbull Billions of usersbull Detects thousands of new malicious websites per day
bull Cloned by Yandex as Yandex Safe Browsing 21
Lookup API
bull Google harvests phishing and malware URLs to feed a blacklist
bull Client checks the status using a simple HTTP GETPOST requestsb-sslgooglecomsafebrowsingapilookupexamplecom
Issues
bull Does not scale Heavy network traffic
bull Privacy URLs are sent in clear
22
Improving privacy using a local cache
Database
Server ClientLocal cache
(3) Conditionalquery
(4) Answer
(1) Query
(2) Answer
Extended client
bull Communication with the server is reduced
bull Better privacy
23
Google Safe Browsing API (v3) Local cache
bull BlacklistsList name Description Entries
goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1
bull Does not handle URLs directly instead their SHA-256 digests
wwwevilcom SHA-256 cc7af8a31918
cc7af8a31918
32-bit prefix
bull Local cache contains prefixes 24
Clientrsquos behavior chart
Userrsquos input
URLCanonicalize andcompute digest
Foundprefix
Get fulldigests
Founddigest
MaliciousURL
Non-maliciousURL
yes
no
yes
no
25
Canonicalization and decompositions
bull Input URLhttpusrpwdabcport12extparam=1frags
bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password
bull Multiple decompositions are checked for a single URL
Decompositions of canonicalized URL
1 abc12extparam=12 abc12ext3 abc14 abc
5 bc12extparam=16 bc12ext7 bc18 bc
bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious
26
Purpose of computing decompositions
Memory saving
bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix
A more intricate example
Unsafe link Safe link
dbcebc
bcabc2
abc1abc
bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc
27
Privacy of Google Safe Browsing
ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy
Our goal A privacy analysis of Google and Yandex Safe Browsing
URL Prefix
wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3
bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set
28
Tracking Safe Browsing users
Our assumptions
bull Google and Yandex have incentives to behave maliciously
bull Wish to learn whether a user visits some selected URLs
How Google can track Safe Browsing users
bull Builds a list of prefixes to track
bull Includes these prefixes in the clientrsquos local cache
bull Learns from the requests whether a user visited a specific URL
bull Key parameter Anonymity-set size
29
Estimating anonymity-set size
bull Anonymity-set size of a prefix URLs that yield the prefix
Year URLs Domains
2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million
bull Estimate anonymity-set size Apply balls-into-bins model
Avg for URLs Avg for Domains
Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast
0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large
30
Sending multiple prefixes
Example with two prefixes
Decomposition Prefix
petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
Intuitively
bull Prefix for petsymposiumorg is not enough for re-identification
bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix
bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification
31
Ambiguity on two prefixes
bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache
URL Decomposition Prefix
Target URL abcabc A
bc B
Ambiguity
Type I gabcgabc C
abc A
bc B
Type II gbcgbc A larr collision on 32 bits
bc B
Type III defdef A larr collision on 32 bits
ef B larr collision on 32 bits
bull P[Type III] = 1264
bull Type II URLs exist only when decomp on the domain gt 232
bull P[Type I] gt P[Type II] gt P[Type III]
bull Mainly only Type I URLs create ambiguity in re-identification 32
How to track a given URL A real-world example (I)
petsymposiumorg2015
petsymposiumorg
petsymposiumorg2016linksphp
petsymposiumorg2016cfpphp
petsymposiumorg2016faqsphp
Type I URLs for target URL
petsymposiumorg2016
Goal Identify users interested in PETs
bull Target URL is petsymposiumorg2016
33
How to track a given URL A real-world example (II)
bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp
Decomposition Prefix
petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
bull Including 2 prefixes in the local cache rArr Anonymity set size of 4
bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes
bull Server receives 2 prefixes rArr visited page is the target URL
bull Server receives 3 prefixes rArr visited page is either of the leaf URLs
bull The third prefix decides which leaf URL was visited
bull Generalizable to any number of prefixes 34
Examples of URLs creating multiple hits
bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing
URL matching decomposition
Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf
17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp
httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues
1001cartesorgtag
Yandex
httpfrxhamstercomuservideofrxhamstercomxhamstercom
httpnlxhamstercomuservideonlxhamstercomxhamstercom
httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom
httpmmofoscomuserloginmmofoscommofoscom
httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom
teenslovehugecockscom
bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom
bull No need to add additional prefix for French or Dutch version35
Responsible disclosure and impact
bull Disclosure to Mozilla Firefox
ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox
bull Disclosure to Yandex
ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team
bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)
ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy
36
Conclusions amp Future Work
Conclusions
Lesson learnt Collisions are hard to tame in security and privacy
Bloom filters
bull Developers tend to ignore the worst-case of algorithms
bull Data structures with ad-hoc crypto primitives are at the best risky
bull Need of secure instantiations eg as in Count-Min sketches
Safe Browsing
bull Re-establish the weakness of anonymity-set privacy model
bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so
38
Future work
Bloom filters
bull On Bloom filters Bloom paradox [Rottenstreich 2015]
bull Beyond Bloom filters Security of Bloom filter variants
Safe Browsing
bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]
bull Privacy Can local cache improve Private Information Retrieval
39
Thank you
40
Other works not covered today
bull Performance of cryptographic accumulators
bull Private password auditing
bull (In)Security of Google and Yandex Safe Browsing
bull Alerting websites Risks and solutions
bull Decompression quines and anti-viruses
bull Pitfalls of hashing for privacy
bull Linkable (zero-knowledge) proofs for private and accountable gossip
41
Applying adversary models
Factors enabling our attacks
bull Insecure hash functions
bull Digest truncation
bull High Bloom filter false positive rate
Vulnerable software applications
Software app Hashing Parameter info
Scrapy Web crawler NA NADablooms Spam filter MurmurHash n = 100000 f = 0057Squid Web proxy MD5 f = 009 k = 4AIEngine NIDS C++ hash ` = 13 n = 5000 f = 045NSRL Forensic tool SHA-1 ` = 32 n asymp 14times 106 f = 808times 10minus10
sdhash Forensic tool SHA-1 ` = 11 n = 128 f = 00014
12
Bypassing a forensic tool
NSRL forensic tool
bull A whitelist of ldquoknown safe filesrdquo
bull Stored and distributed as a Bloom filter
bull Maintained by NIST
A query-only attack Goal is to hide a contraband file
bull Adversary modifies the file to create a false positive
bull Modification should be easily reversible
bull The filter detects the file as safe
13
Countermeasure against chosen-insertion attacks
Use worst-case parameters for Bloom filters
bull Fix m n and choose k that minimizes false positive probability
f adv =
(nk
m
)k
Optimal values are
kadvopt =
m
enand f adv
opt = eminusmen
bull Impact On a sample Bloom filter with m = 3200 n = 600bull Average case kopt = 4 fopt = 0077bull Worst case kadv
opt = 2 f advopt = 01
14
Summary of other attacks amp defenses
Attacks
Software app Attacks
Scrapy Web crawler chosen-insertion query-onlyDablooms Spam filter chosen-insertion deletionSquid Web proxy chosen-insertion query-onlyAIEngine NIDS query-onlysdhash Forensic tool query-only
Defenses
bull Use HMAC
bull Use an alternate data structure Bloom hash tables [Bloom 1970]bull Resists better to chosen-insertion attacksbull Is often more memory efficient than Bloom filtersbull On average O(k) hash computations for items not in the tablebull On average O(ln k) for items in the table
15
Related work
bull Algorithmic complexity attacks [Crosby et al 2003]bull DoS attacks against hash tablesbull Force hash tables to operate in O(n) instead of O(1)bull Similar attacks against skip-lists regular expressions etc
bull Independent work on Bloom filters [Naor et al 2015]bull Provide a theoretical frameworkbull Study a query-only adversary Can only adaptively query the filter
16
Privacy Safe Browsing
Google Safe Browsing in Mozilla Firefox
18
And many others
19
Adverted privacy policy
ldquoWe collect visited web pages clickstream data or web addressaccessed browser identifier and user IDrdquo mdash WOT
ldquocollects information including IP address the origin of thesearch and may share this info with a third partyrdquo mdash Norton
Many Safe Browsing services are privacy unfriendly by design
ldquocannot determine the real URL from the informationreceivedrdquo mdash Google
bull Google seems to provide the most private service
bull Hence focus of this work
20
Google Safe Browsing When Why and How
bull When In 2008 by Google
bull Goals Protect frombull Phishing sitesbull Malware sites
bull How Easy-to-use APIs in C Python and PHP
bull Methodology Blacklists
bull Available in
bull Impactbull Billions of usersbull Detects thousands of new malicious websites per day
bull Cloned by Yandex as Yandex Safe Browsing 21
Lookup API
bull Google harvests phishing and malware URLs to feed a blacklist
bull Client checks the status using a simple HTTP GETPOST requestsb-sslgooglecomsafebrowsingapilookupexamplecom
Issues
bull Does not scale Heavy network traffic
bull Privacy URLs are sent in clear
22
Improving privacy using a local cache
Database
Server ClientLocal cache
(3) Conditionalquery
(4) Answer
(1) Query
(2) Answer
Extended client
bull Communication with the server is reduced
bull Better privacy
23
Google Safe Browsing API (v3) Local cache
bull BlacklistsList name Description Entries
goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1
bull Does not handle URLs directly instead their SHA-256 digests
wwwevilcom SHA-256 cc7af8a31918
cc7af8a31918
32-bit prefix
bull Local cache contains prefixes 24
Clientrsquos behavior chart
Userrsquos input
URLCanonicalize andcompute digest
Foundprefix
Get fulldigests
Founddigest
MaliciousURL
Non-maliciousURL
yes
no
yes
no
25
Canonicalization and decompositions
bull Input URLhttpusrpwdabcport12extparam=1frags
bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password
bull Multiple decompositions are checked for a single URL
Decompositions of canonicalized URL
1 abc12extparam=12 abc12ext3 abc14 abc
5 bc12extparam=16 bc12ext7 bc18 bc
bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious
26
Purpose of computing decompositions
Memory saving
bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix
A more intricate example
Unsafe link Safe link
dbcebc
bcabc2
abc1abc
bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc
27
Privacy of Google Safe Browsing
ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy
Our goal A privacy analysis of Google and Yandex Safe Browsing
URL Prefix
wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3
bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set
28
Tracking Safe Browsing users
Our assumptions
bull Google and Yandex have incentives to behave maliciously
bull Wish to learn whether a user visits some selected URLs
How Google can track Safe Browsing users
bull Builds a list of prefixes to track
bull Includes these prefixes in the clientrsquos local cache
bull Learns from the requests whether a user visited a specific URL
bull Key parameter Anonymity-set size
29
Estimating anonymity-set size
bull Anonymity-set size of a prefix URLs that yield the prefix
Year URLs Domains
2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million
bull Estimate anonymity-set size Apply balls-into-bins model
Avg for URLs Avg for Domains
Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast
0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large
30
Sending multiple prefixes
Example with two prefixes
Decomposition Prefix
petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
Intuitively
bull Prefix for petsymposiumorg is not enough for re-identification
bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix
bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification
31
Ambiguity on two prefixes
bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache
URL Decomposition Prefix
Target URL abcabc A
bc B
Ambiguity
Type I gabcgabc C
abc A
bc B
Type II gbcgbc A larr collision on 32 bits
bc B
Type III defdef A larr collision on 32 bits
ef B larr collision on 32 bits
bull P[Type III] = 1264
bull Type II URLs exist only when decomp on the domain gt 232
bull P[Type I] gt P[Type II] gt P[Type III]
bull Mainly only Type I URLs create ambiguity in re-identification 32
How to track a given URL A real-world example (I)
petsymposiumorg2015
petsymposiumorg
petsymposiumorg2016linksphp
petsymposiumorg2016cfpphp
petsymposiumorg2016faqsphp
Type I URLs for target URL
petsymposiumorg2016
Goal Identify users interested in PETs
bull Target URL is petsymposiumorg2016
33
How to track a given URL A real-world example (II)
bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp
Decomposition Prefix
petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
bull Including 2 prefixes in the local cache rArr Anonymity set size of 4
bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes
bull Server receives 2 prefixes rArr visited page is the target URL
bull Server receives 3 prefixes rArr visited page is either of the leaf URLs
bull The third prefix decides which leaf URL was visited
bull Generalizable to any number of prefixes 34
Examples of URLs creating multiple hits
bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing
URL matching decomposition
Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf
17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp
httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues
1001cartesorgtag
Yandex
httpfrxhamstercomuservideofrxhamstercomxhamstercom
httpnlxhamstercomuservideonlxhamstercomxhamstercom
httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom
httpmmofoscomuserloginmmofoscommofoscom
httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom
teenslovehugecockscom
bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom
bull No need to add additional prefix for French or Dutch version35
Responsible disclosure and impact
bull Disclosure to Mozilla Firefox
ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox
bull Disclosure to Yandex
ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team
bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)
ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy
36
Conclusions amp Future Work
Conclusions
Lesson learnt Collisions are hard to tame in security and privacy
Bloom filters
bull Developers tend to ignore the worst-case of algorithms
bull Data structures with ad-hoc crypto primitives are at the best risky
bull Need of secure instantiations eg as in Count-Min sketches
Safe Browsing
bull Re-establish the weakness of anonymity-set privacy model
bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so
38
Future work
Bloom filters
bull On Bloom filters Bloom paradox [Rottenstreich 2015]
bull Beyond Bloom filters Security of Bloom filter variants
Safe Browsing
bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]
bull Privacy Can local cache improve Private Information Retrieval
39
Thank you
40
Other works not covered today
bull Performance of cryptographic accumulators
bull Private password auditing
bull (In)Security of Google and Yandex Safe Browsing
bull Alerting websites Risks and solutions
bull Decompression quines and anti-viruses
bull Pitfalls of hashing for privacy
bull Linkable (zero-knowledge) proofs for private and accountable gossip
41
Bypassing a forensic tool
NSRL forensic tool
bull A whitelist of ldquoknown safe filesrdquo
bull Stored and distributed as a Bloom filter
bull Maintained by NIST
A query-only attack Goal is to hide a contraband file
bull Adversary modifies the file to create a false positive
bull Modification should be easily reversible
bull The filter detects the file as safe
13
Countermeasure against chosen-insertion attacks
Use worst-case parameters for Bloom filters
bull Fix m n and choose k that minimizes false positive probability
f adv =
(nk
m
)k
Optimal values are
kadvopt =
m
enand f adv
opt = eminusmen
bull Impact On a sample Bloom filter with m = 3200 n = 600bull Average case kopt = 4 fopt = 0077bull Worst case kadv
opt = 2 f advopt = 01
14
Summary of other attacks amp defenses
Attacks
Software app Attacks
Scrapy Web crawler chosen-insertion query-onlyDablooms Spam filter chosen-insertion deletionSquid Web proxy chosen-insertion query-onlyAIEngine NIDS query-onlysdhash Forensic tool query-only
Defenses
bull Use HMAC
bull Use an alternate data structure Bloom hash tables [Bloom 1970]bull Resists better to chosen-insertion attacksbull Is often more memory efficient than Bloom filtersbull On average O(k) hash computations for items not in the tablebull On average O(ln k) for items in the table
15
Related work
bull Algorithmic complexity attacks [Crosby et al 2003]bull DoS attacks against hash tablesbull Force hash tables to operate in O(n) instead of O(1)bull Similar attacks against skip-lists regular expressions etc
bull Independent work on Bloom filters [Naor et al 2015]bull Provide a theoretical frameworkbull Study a query-only adversary Can only adaptively query the filter
16
Privacy Safe Browsing
Google Safe Browsing in Mozilla Firefox
18
And many others
19
Adverted privacy policy
ldquoWe collect visited web pages clickstream data or web addressaccessed browser identifier and user IDrdquo mdash WOT
ldquocollects information including IP address the origin of thesearch and may share this info with a third partyrdquo mdash Norton
Many Safe Browsing services are privacy unfriendly by design
ldquocannot determine the real URL from the informationreceivedrdquo mdash Google
bull Google seems to provide the most private service
bull Hence focus of this work
20
Google Safe Browsing When Why and How
bull When In 2008 by Google
bull Goals Protect frombull Phishing sitesbull Malware sites
bull How Easy-to-use APIs in C Python and PHP
bull Methodology Blacklists
bull Available in
bull Impactbull Billions of usersbull Detects thousands of new malicious websites per day
bull Cloned by Yandex as Yandex Safe Browsing 21
Lookup API
bull Google harvests phishing and malware URLs to feed a blacklist
bull Client checks the status using a simple HTTP GETPOST requestsb-sslgooglecomsafebrowsingapilookupexamplecom
Issues
bull Does not scale Heavy network traffic
bull Privacy URLs are sent in clear
22
Improving privacy using a local cache
Database
Server ClientLocal cache
(3) Conditionalquery
(4) Answer
(1) Query
(2) Answer
Extended client
bull Communication with the server is reduced
bull Better privacy
23
Google Safe Browsing API (v3) Local cache
bull BlacklistsList name Description Entries
goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1
bull Does not handle URLs directly instead their SHA-256 digests
wwwevilcom SHA-256 cc7af8a31918
cc7af8a31918
32-bit prefix
bull Local cache contains prefixes 24
Clientrsquos behavior chart
Userrsquos input
URLCanonicalize andcompute digest
Foundprefix
Get fulldigests
Founddigest
MaliciousURL
Non-maliciousURL
yes
no
yes
no
25
Canonicalization and decompositions
bull Input URLhttpusrpwdabcport12extparam=1frags
bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password
bull Multiple decompositions are checked for a single URL
Decompositions of canonicalized URL
1 abc12extparam=12 abc12ext3 abc14 abc
5 bc12extparam=16 bc12ext7 bc18 bc
bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious
26
Purpose of computing decompositions
Memory saving
bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix
A more intricate example
Unsafe link Safe link
dbcebc
bcabc2
abc1abc
bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc
27
Privacy of Google Safe Browsing
ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy
Our goal A privacy analysis of Google and Yandex Safe Browsing
URL Prefix
wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3
bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set
28
Tracking Safe Browsing users
Our assumptions
bull Google and Yandex have incentives to behave maliciously
bull Wish to learn whether a user visits some selected URLs
How Google can track Safe Browsing users
bull Builds a list of prefixes to track
bull Includes these prefixes in the clientrsquos local cache
bull Learns from the requests whether a user visited a specific URL
bull Key parameter Anonymity-set size
29
Estimating anonymity-set size
bull Anonymity-set size of a prefix URLs that yield the prefix
Year URLs Domains
2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million
bull Estimate anonymity-set size Apply balls-into-bins model
Avg for URLs Avg for Domains
Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast
0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large
30
Sending multiple prefixes
Example with two prefixes
Decomposition Prefix
petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
Intuitively
bull Prefix for petsymposiumorg is not enough for re-identification
bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix
bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification
31
Ambiguity on two prefixes
bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache
URL Decomposition Prefix
Target URL abcabc A
bc B
Ambiguity
Type I gabcgabc C
abc A
bc B
Type II gbcgbc A larr collision on 32 bits
bc B
Type III defdef A larr collision on 32 bits
ef B larr collision on 32 bits
bull P[Type III] = 1264
bull Type II URLs exist only when decomp on the domain gt 232
bull P[Type I] gt P[Type II] gt P[Type III]
bull Mainly only Type I URLs create ambiguity in re-identification 32
How to track a given URL A real-world example (I)
petsymposiumorg2015
petsymposiumorg
petsymposiumorg2016linksphp
petsymposiumorg2016cfpphp
petsymposiumorg2016faqsphp
Type I URLs for target URL
petsymposiumorg2016
Goal Identify users interested in PETs
bull Target URL is petsymposiumorg2016
33
How to track a given URL A real-world example (II)
bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp
Decomposition Prefix
petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
bull Including 2 prefixes in the local cache rArr Anonymity set size of 4
bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes
bull Server receives 2 prefixes rArr visited page is the target URL
bull Server receives 3 prefixes rArr visited page is either of the leaf URLs
bull The third prefix decides which leaf URL was visited
bull Generalizable to any number of prefixes 34
Examples of URLs creating multiple hits
bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing
URL matching decomposition
Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf
17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp
httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues
1001cartesorgtag
Yandex
httpfrxhamstercomuservideofrxhamstercomxhamstercom
httpnlxhamstercomuservideonlxhamstercomxhamstercom
httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom
httpmmofoscomuserloginmmofoscommofoscom
httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom
teenslovehugecockscom
bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom
bull No need to add additional prefix for French or Dutch version35
Responsible disclosure and impact
bull Disclosure to Mozilla Firefox
ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox
bull Disclosure to Yandex
ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team
bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)
ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy
36
Conclusions amp Future Work
Conclusions
Lesson learnt Collisions are hard to tame in security and privacy
Bloom filters
bull Developers tend to ignore the worst-case of algorithms
bull Data structures with ad-hoc crypto primitives are at the best risky
bull Need of secure instantiations eg as in Count-Min sketches
Safe Browsing
bull Re-establish the weakness of anonymity-set privacy model
bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so
38
Future work
Bloom filters
bull On Bloom filters Bloom paradox [Rottenstreich 2015]
bull Beyond Bloom filters Security of Bloom filter variants
Safe Browsing
bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]
bull Privacy Can local cache improve Private Information Retrieval
39
Thank you
40
Other works not covered today
bull Performance of cryptographic accumulators
bull Private password auditing
bull (In)Security of Google and Yandex Safe Browsing
bull Alerting websites Risks and solutions
bull Decompression quines and anti-viruses
bull Pitfalls of hashing for privacy
bull Linkable (zero-knowledge) proofs for private and accountable gossip
41
Countermeasure against chosen-insertion attacks
Use worst-case parameters for Bloom filters
bull Fix m n and choose k that minimizes false positive probability
f adv =
(nk
m
)k
Optimal values are
kadvopt =
m
enand f adv
opt = eminusmen
bull Impact On a sample Bloom filter with m = 3200 n = 600bull Average case kopt = 4 fopt = 0077bull Worst case kadv
opt = 2 f advopt = 01
14
Summary of other attacks amp defenses
Attacks
Software app Attacks
Scrapy Web crawler chosen-insertion query-onlyDablooms Spam filter chosen-insertion deletionSquid Web proxy chosen-insertion query-onlyAIEngine NIDS query-onlysdhash Forensic tool query-only
Defenses
bull Use HMAC
bull Use an alternate data structure Bloom hash tables [Bloom 1970]bull Resists better to chosen-insertion attacksbull Is often more memory efficient than Bloom filtersbull On average O(k) hash computations for items not in the tablebull On average O(ln k) for items in the table
15
Related work
bull Algorithmic complexity attacks [Crosby et al 2003]bull DoS attacks against hash tablesbull Force hash tables to operate in O(n) instead of O(1)bull Similar attacks against skip-lists regular expressions etc
bull Independent work on Bloom filters [Naor et al 2015]bull Provide a theoretical frameworkbull Study a query-only adversary Can only adaptively query the filter
16
Privacy Safe Browsing
Google Safe Browsing in Mozilla Firefox
18
And many others
19
Adverted privacy policy
ldquoWe collect visited web pages clickstream data or web addressaccessed browser identifier and user IDrdquo mdash WOT
ldquocollects information including IP address the origin of thesearch and may share this info with a third partyrdquo mdash Norton
Many Safe Browsing services are privacy unfriendly by design
ldquocannot determine the real URL from the informationreceivedrdquo mdash Google
bull Google seems to provide the most private service
bull Hence focus of this work
20
Google Safe Browsing When Why and How
bull When In 2008 by Google
bull Goals Protect frombull Phishing sitesbull Malware sites
bull How Easy-to-use APIs in C Python and PHP
bull Methodology Blacklists
bull Available in
bull Impactbull Billions of usersbull Detects thousands of new malicious websites per day
bull Cloned by Yandex as Yandex Safe Browsing 21
Lookup API
bull Google harvests phishing and malware URLs to feed a blacklist
bull Client checks the status using a simple HTTP GETPOST requestsb-sslgooglecomsafebrowsingapilookupexamplecom
Issues
bull Does not scale Heavy network traffic
bull Privacy URLs are sent in clear
22
Improving privacy using a local cache
Database
Server ClientLocal cache
(3) Conditionalquery
(4) Answer
(1) Query
(2) Answer
Extended client
bull Communication with the server is reduced
bull Better privacy
23
Google Safe Browsing API (v3) Local cache
bull BlacklistsList name Description Entries
goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1
bull Does not handle URLs directly instead their SHA-256 digests
wwwevilcom SHA-256 cc7af8a31918
cc7af8a31918
32-bit prefix
bull Local cache contains prefixes 24
Clientrsquos behavior chart
Userrsquos input
URLCanonicalize andcompute digest
Foundprefix
Get fulldigests
Founddigest
MaliciousURL
Non-maliciousURL
yes
no
yes
no
25
Canonicalization and decompositions
bull Input URLhttpusrpwdabcport12extparam=1frags
bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password
bull Multiple decompositions are checked for a single URL
Decompositions of canonicalized URL
1 abc12extparam=12 abc12ext3 abc14 abc
5 bc12extparam=16 bc12ext7 bc18 bc
bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious
26
Purpose of computing decompositions
Memory saving
bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix
A more intricate example
Unsafe link Safe link
dbcebc
bcabc2
abc1abc
bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc
27
Privacy of Google Safe Browsing
ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy
Our goal A privacy analysis of Google and Yandex Safe Browsing
URL Prefix
wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3
bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set
28
Tracking Safe Browsing users
Our assumptions
bull Google and Yandex have incentives to behave maliciously
bull Wish to learn whether a user visits some selected URLs
How Google can track Safe Browsing users
bull Builds a list of prefixes to track
bull Includes these prefixes in the clientrsquos local cache
bull Learns from the requests whether a user visited a specific URL
bull Key parameter Anonymity-set size
29
Estimating anonymity-set size
bull Anonymity-set size of a prefix URLs that yield the prefix
Year URLs Domains
2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million
bull Estimate anonymity-set size Apply balls-into-bins model
Avg for URLs Avg for Domains
Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast
0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large
30
Sending multiple prefixes
Example with two prefixes
Decomposition Prefix
petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
Intuitively
bull Prefix for petsymposiumorg is not enough for re-identification
bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix
bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification
31
Ambiguity on two prefixes
bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache
URL Decomposition Prefix
Target URL abcabc A
bc B
Ambiguity
Type I gabcgabc C
abc A
bc B
Type II gbcgbc A larr collision on 32 bits
bc B
Type III defdef A larr collision on 32 bits
ef B larr collision on 32 bits
bull P[Type III] = 1264
bull Type II URLs exist only when decomp on the domain gt 232
bull P[Type I] gt P[Type II] gt P[Type III]
bull Mainly only Type I URLs create ambiguity in re-identification 32
How to track a given URL A real-world example (I)
petsymposiumorg2015
petsymposiumorg
petsymposiumorg2016linksphp
petsymposiumorg2016cfpphp
petsymposiumorg2016faqsphp
Type I URLs for target URL
petsymposiumorg2016
Goal Identify users interested in PETs
bull Target URL is petsymposiumorg2016
33
How to track a given URL A real-world example (II)
bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp
Decomposition Prefix
petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
bull Including 2 prefixes in the local cache rArr Anonymity set size of 4
bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes
bull Server receives 2 prefixes rArr visited page is the target URL
bull Server receives 3 prefixes rArr visited page is either of the leaf URLs
bull The third prefix decides which leaf URL was visited
bull Generalizable to any number of prefixes 34
Examples of URLs creating multiple hits
bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing
URL matching decomposition
Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf
17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp
httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues
1001cartesorgtag
Yandex
httpfrxhamstercomuservideofrxhamstercomxhamstercom
httpnlxhamstercomuservideonlxhamstercomxhamstercom
httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom
httpmmofoscomuserloginmmofoscommofoscom
httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom
teenslovehugecockscom
bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom
bull No need to add additional prefix for French or Dutch version35
Responsible disclosure and impact
bull Disclosure to Mozilla Firefox
ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox
bull Disclosure to Yandex
ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team
bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)
ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy
36
Conclusions amp Future Work
Conclusions
Lesson learnt Collisions are hard to tame in security and privacy
Bloom filters
bull Developers tend to ignore the worst-case of algorithms
bull Data structures with ad-hoc crypto primitives are at the best risky
bull Need of secure instantiations eg as in Count-Min sketches
Safe Browsing
bull Re-establish the weakness of anonymity-set privacy model
bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so
38
Future work
Bloom filters
bull On Bloom filters Bloom paradox [Rottenstreich 2015]
bull Beyond Bloom filters Security of Bloom filter variants
Safe Browsing
bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]
bull Privacy Can local cache improve Private Information Retrieval
39
Thank you
40
Other works not covered today
bull Performance of cryptographic accumulators
bull Private password auditing
bull (In)Security of Google and Yandex Safe Browsing
bull Alerting websites Risks and solutions
bull Decompression quines and anti-viruses
bull Pitfalls of hashing for privacy
bull Linkable (zero-knowledge) proofs for private and accountable gossip
41
Summary of other attacks amp defenses
Attacks
Software app Attacks
Scrapy Web crawler chosen-insertion query-onlyDablooms Spam filter chosen-insertion deletionSquid Web proxy chosen-insertion query-onlyAIEngine NIDS query-onlysdhash Forensic tool query-only
Defenses
bull Use HMAC
bull Use an alternate data structure Bloom hash tables [Bloom 1970]bull Resists better to chosen-insertion attacksbull Is often more memory efficient than Bloom filtersbull On average O(k) hash computations for items not in the tablebull On average O(ln k) for items in the table
15
Related work
bull Algorithmic complexity attacks [Crosby et al 2003]bull DoS attacks against hash tablesbull Force hash tables to operate in O(n) instead of O(1)bull Similar attacks against skip-lists regular expressions etc
bull Independent work on Bloom filters [Naor et al 2015]bull Provide a theoretical frameworkbull Study a query-only adversary Can only adaptively query the filter
16
Privacy Safe Browsing
Google Safe Browsing in Mozilla Firefox
18
And many others
19
Adverted privacy policy
ldquoWe collect visited web pages clickstream data or web addressaccessed browser identifier and user IDrdquo mdash WOT
ldquocollects information including IP address the origin of thesearch and may share this info with a third partyrdquo mdash Norton
Many Safe Browsing services are privacy unfriendly by design
ldquocannot determine the real URL from the informationreceivedrdquo mdash Google
bull Google seems to provide the most private service
bull Hence focus of this work
20
Google Safe Browsing When Why and How
bull When In 2008 by Google
bull Goals Protect frombull Phishing sitesbull Malware sites
bull How Easy-to-use APIs in C Python and PHP
bull Methodology Blacklists
bull Available in
bull Impactbull Billions of usersbull Detects thousands of new malicious websites per day
bull Cloned by Yandex as Yandex Safe Browsing 21
Lookup API
bull Google harvests phishing and malware URLs to feed a blacklist
bull Client checks the status using a simple HTTP GETPOST requestsb-sslgooglecomsafebrowsingapilookupexamplecom
Issues
bull Does not scale Heavy network traffic
bull Privacy URLs are sent in clear
22
Improving privacy using a local cache
Database
Server ClientLocal cache
(3) Conditionalquery
(4) Answer
(1) Query
(2) Answer
Extended client
bull Communication with the server is reduced
bull Better privacy
23
Google Safe Browsing API (v3) Local cache
bull BlacklistsList name Description Entries
goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1
bull Does not handle URLs directly instead their SHA-256 digests
wwwevilcom SHA-256 cc7af8a31918
cc7af8a31918
32-bit prefix
bull Local cache contains prefixes 24
Clientrsquos behavior chart
Userrsquos input
URLCanonicalize andcompute digest
Foundprefix
Get fulldigests
Founddigest
MaliciousURL
Non-maliciousURL
yes
no
yes
no
25
Canonicalization and decompositions
bull Input URLhttpusrpwdabcport12extparam=1frags
bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password
bull Multiple decompositions are checked for a single URL
Decompositions of canonicalized URL
1 abc12extparam=12 abc12ext3 abc14 abc
5 bc12extparam=16 bc12ext7 bc18 bc
bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious
26
Purpose of computing decompositions
Memory saving
bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix
A more intricate example
Unsafe link Safe link
dbcebc
bcabc2
abc1abc
bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc
27
Privacy of Google Safe Browsing
ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy
Our goal A privacy analysis of Google and Yandex Safe Browsing
URL Prefix
wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3
bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set
28
Tracking Safe Browsing users
Our assumptions
bull Google and Yandex have incentives to behave maliciously
bull Wish to learn whether a user visits some selected URLs
How Google can track Safe Browsing users
bull Builds a list of prefixes to track
bull Includes these prefixes in the clientrsquos local cache
bull Learns from the requests whether a user visited a specific URL
bull Key parameter Anonymity-set size
29
Estimating anonymity-set size
bull Anonymity-set size of a prefix URLs that yield the prefix
Year URLs Domains
2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million
bull Estimate anonymity-set size Apply balls-into-bins model
Avg for URLs Avg for Domains
Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast
0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large
30
Sending multiple prefixes
Example with two prefixes
Decomposition Prefix
petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
Intuitively
bull Prefix for petsymposiumorg is not enough for re-identification
bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix
bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification
31
Ambiguity on two prefixes
bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache
URL Decomposition Prefix
Target URL abcabc A
bc B
Ambiguity
Type I gabcgabc C
abc A
bc B
Type II gbcgbc A larr collision on 32 bits
bc B
Type III defdef A larr collision on 32 bits
ef B larr collision on 32 bits
bull P[Type III] = 1264
bull Type II URLs exist only when decomp on the domain gt 232
bull P[Type I] gt P[Type II] gt P[Type III]
bull Mainly only Type I URLs create ambiguity in re-identification 32
How to track a given URL A real-world example (I)
petsymposiumorg2015
petsymposiumorg
petsymposiumorg2016linksphp
petsymposiumorg2016cfpphp
petsymposiumorg2016faqsphp
Type I URLs for target URL
petsymposiumorg2016
Goal Identify users interested in PETs
bull Target URL is petsymposiumorg2016
33
How to track a given URL A real-world example (II)
bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp
Decomposition Prefix
petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
bull Including 2 prefixes in the local cache rArr Anonymity set size of 4
bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes
bull Server receives 2 prefixes rArr visited page is the target URL
bull Server receives 3 prefixes rArr visited page is either of the leaf URLs
bull The third prefix decides which leaf URL was visited
bull Generalizable to any number of prefixes 34
Examples of URLs creating multiple hits
bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing
URL matching decomposition
Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf
17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp
httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues
1001cartesorgtag
Yandex
httpfrxhamstercomuservideofrxhamstercomxhamstercom
httpnlxhamstercomuservideonlxhamstercomxhamstercom
httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom
httpmmofoscomuserloginmmofoscommofoscom
httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom
teenslovehugecockscom
bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom
bull No need to add additional prefix for French or Dutch version35
Responsible disclosure and impact
bull Disclosure to Mozilla Firefox
ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox
bull Disclosure to Yandex
ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team
bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)
ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy
36
Conclusions amp Future Work
Conclusions
Lesson learnt Collisions are hard to tame in security and privacy
Bloom filters
bull Developers tend to ignore the worst-case of algorithms
bull Data structures with ad-hoc crypto primitives are at the best risky
bull Need of secure instantiations eg as in Count-Min sketches
Safe Browsing
bull Re-establish the weakness of anonymity-set privacy model
bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so
38
Future work
Bloom filters
bull On Bloom filters Bloom paradox [Rottenstreich 2015]
bull Beyond Bloom filters Security of Bloom filter variants
Safe Browsing
bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]
bull Privacy Can local cache improve Private Information Retrieval
39
Thank you
40
Other works not covered today
bull Performance of cryptographic accumulators
bull Private password auditing
bull (In)Security of Google and Yandex Safe Browsing
bull Alerting websites Risks and solutions
bull Decompression quines and anti-viruses
bull Pitfalls of hashing for privacy
bull Linkable (zero-knowledge) proofs for private and accountable gossip
41
Related work
bull Algorithmic complexity attacks [Crosby et al 2003]bull DoS attacks against hash tablesbull Force hash tables to operate in O(n) instead of O(1)bull Similar attacks against skip-lists regular expressions etc
bull Independent work on Bloom filters [Naor et al 2015]bull Provide a theoretical frameworkbull Study a query-only adversary Can only adaptively query the filter
16
Privacy Safe Browsing
Google Safe Browsing in Mozilla Firefox
18
And many others
19
Adverted privacy policy
ldquoWe collect visited web pages clickstream data or web addressaccessed browser identifier and user IDrdquo mdash WOT
ldquocollects information including IP address the origin of thesearch and may share this info with a third partyrdquo mdash Norton
Many Safe Browsing services are privacy unfriendly by design
ldquocannot determine the real URL from the informationreceivedrdquo mdash Google
bull Google seems to provide the most private service
bull Hence focus of this work
20
Google Safe Browsing When Why and How
bull When In 2008 by Google
bull Goals Protect frombull Phishing sitesbull Malware sites
bull How Easy-to-use APIs in C Python and PHP
bull Methodology Blacklists
bull Available in
bull Impactbull Billions of usersbull Detects thousands of new malicious websites per day
bull Cloned by Yandex as Yandex Safe Browsing 21
Lookup API
bull Google harvests phishing and malware URLs to feed a blacklist
bull Client checks the status using a simple HTTP GETPOST requestsb-sslgooglecomsafebrowsingapilookupexamplecom
Issues
bull Does not scale Heavy network traffic
bull Privacy URLs are sent in clear
22
Improving privacy using a local cache
Database
Server ClientLocal cache
(3) Conditionalquery
(4) Answer
(1) Query
(2) Answer
Extended client
bull Communication with the server is reduced
bull Better privacy
23
Google Safe Browsing API (v3) Local cache
bull BlacklistsList name Description Entries
goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1
bull Does not handle URLs directly instead their SHA-256 digests
wwwevilcom SHA-256 cc7af8a31918
cc7af8a31918
32-bit prefix
bull Local cache contains prefixes 24
Clientrsquos behavior chart
Userrsquos input
URLCanonicalize andcompute digest
Foundprefix
Get fulldigests
Founddigest
MaliciousURL
Non-maliciousURL
yes
no
yes
no
25
Canonicalization and decompositions
bull Input URLhttpusrpwdabcport12extparam=1frags
bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password
bull Multiple decompositions are checked for a single URL
Decompositions of canonicalized URL
1 abc12extparam=12 abc12ext3 abc14 abc
5 bc12extparam=16 bc12ext7 bc18 bc
bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious
26
Purpose of computing decompositions
Memory saving
bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix
A more intricate example
Unsafe link Safe link
dbcebc
bcabc2
abc1abc
bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc
27
Privacy of Google Safe Browsing
ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy
Our goal A privacy analysis of Google and Yandex Safe Browsing
URL Prefix
wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3
bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set
28
Tracking Safe Browsing users
Our assumptions
bull Google and Yandex have incentives to behave maliciously
bull Wish to learn whether a user visits some selected URLs
How Google can track Safe Browsing users
bull Builds a list of prefixes to track
bull Includes these prefixes in the clientrsquos local cache
bull Learns from the requests whether a user visited a specific URL
bull Key parameter Anonymity-set size
29
Estimating anonymity-set size
bull Anonymity-set size of a prefix URLs that yield the prefix
Year URLs Domains
2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million
bull Estimate anonymity-set size Apply balls-into-bins model
Avg for URLs Avg for Domains
Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast
0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large
30
Sending multiple prefixes
Example with two prefixes
Decomposition Prefix
petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
Intuitively
bull Prefix for petsymposiumorg is not enough for re-identification
bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix
bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification
31
Ambiguity on two prefixes
bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache
URL Decomposition Prefix
Target URL abcabc A
bc B
Ambiguity
Type I gabcgabc C
abc A
bc B
Type II gbcgbc A larr collision on 32 bits
bc B
Type III defdef A larr collision on 32 bits
ef B larr collision on 32 bits
bull P[Type III] = 1264
bull Type II URLs exist only when decomp on the domain gt 232
bull P[Type I] gt P[Type II] gt P[Type III]
bull Mainly only Type I URLs create ambiguity in re-identification 32
How to track a given URL A real-world example (I)
petsymposiumorg2015
petsymposiumorg
petsymposiumorg2016linksphp
petsymposiumorg2016cfpphp
petsymposiumorg2016faqsphp
Type I URLs for target URL
petsymposiumorg2016
Goal Identify users interested in PETs
bull Target URL is petsymposiumorg2016
33
How to track a given URL A real-world example (II)
bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp
Decomposition Prefix
petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
bull Including 2 prefixes in the local cache rArr Anonymity set size of 4
bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes
bull Server receives 2 prefixes rArr visited page is the target URL
bull Server receives 3 prefixes rArr visited page is either of the leaf URLs
bull The third prefix decides which leaf URL was visited
bull Generalizable to any number of prefixes 34
Examples of URLs creating multiple hits
bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing
URL matching decomposition
Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf
17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp
httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues
1001cartesorgtag
Yandex
httpfrxhamstercomuservideofrxhamstercomxhamstercom
httpnlxhamstercomuservideonlxhamstercomxhamstercom
httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom
httpmmofoscomuserloginmmofoscommofoscom
httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom
teenslovehugecockscom
bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom
bull No need to add additional prefix for French or Dutch version35
Responsible disclosure and impact
bull Disclosure to Mozilla Firefox
ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox
bull Disclosure to Yandex
ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team
bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)
ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy
36
Conclusions amp Future Work
Conclusions
Lesson learnt Collisions are hard to tame in security and privacy
Bloom filters
bull Developers tend to ignore the worst-case of algorithms
bull Data structures with ad-hoc crypto primitives are at the best risky
bull Need of secure instantiations eg as in Count-Min sketches
Safe Browsing
bull Re-establish the weakness of anonymity-set privacy model
bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so
38
Future work
Bloom filters
bull On Bloom filters Bloom paradox [Rottenstreich 2015]
bull Beyond Bloom filters Security of Bloom filter variants
Safe Browsing
bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]
bull Privacy Can local cache improve Private Information Retrieval
39
Thank you
40
Other works not covered today
bull Performance of cryptographic accumulators
bull Private password auditing
bull (In)Security of Google and Yandex Safe Browsing
bull Alerting websites Risks and solutions
bull Decompression quines and anti-viruses
bull Pitfalls of hashing for privacy
bull Linkable (zero-knowledge) proofs for private and accountable gossip
41
Privacy Safe Browsing
Google Safe Browsing in Mozilla Firefox
18
And many others
19
Adverted privacy policy
ldquoWe collect visited web pages clickstream data or web addressaccessed browser identifier and user IDrdquo mdash WOT
ldquocollects information including IP address the origin of thesearch and may share this info with a third partyrdquo mdash Norton
Many Safe Browsing services are privacy unfriendly by design
ldquocannot determine the real URL from the informationreceivedrdquo mdash Google
bull Google seems to provide the most private service
bull Hence focus of this work
20
Google Safe Browsing When Why and How
bull When In 2008 by Google
bull Goals Protect frombull Phishing sitesbull Malware sites
bull How Easy-to-use APIs in C Python and PHP
bull Methodology Blacklists
bull Available in
bull Impactbull Billions of usersbull Detects thousands of new malicious websites per day
bull Cloned by Yandex as Yandex Safe Browsing 21
Lookup API
bull Google harvests phishing and malware URLs to feed a blacklist
bull Client checks the status using a simple HTTP GETPOST requestsb-sslgooglecomsafebrowsingapilookupexamplecom
Issues
bull Does not scale Heavy network traffic
bull Privacy URLs are sent in clear
22
Improving privacy using a local cache
Database
Server ClientLocal cache
(3) Conditionalquery
(4) Answer
(1) Query
(2) Answer
Extended client
bull Communication with the server is reduced
bull Better privacy
23
Google Safe Browsing API (v3) Local cache
bull BlacklistsList name Description Entries
goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1
bull Does not handle URLs directly instead their SHA-256 digests
wwwevilcom SHA-256 cc7af8a31918
cc7af8a31918
32-bit prefix
bull Local cache contains prefixes 24
Clientrsquos behavior chart
Userrsquos input
URLCanonicalize andcompute digest
Foundprefix
Get fulldigests
Founddigest
MaliciousURL
Non-maliciousURL
yes
no
yes
no
25
Canonicalization and decompositions
bull Input URLhttpusrpwdabcport12extparam=1frags
bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password
bull Multiple decompositions are checked for a single URL
Decompositions of canonicalized URL
1 abc12extparam=12 abc12ext3 abc14 abc
5 bc12extparam=16 bc12ext7 bc18 bc
bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious
26
Purpose of computing decompositions
Memory saving
bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix
A more intricate example
Unsafe link Safe link
dbcebc
bcabc2
abc1abc
bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc
27
Privacy of Google Safe Browsing
ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy
Our goal A privacy analysis of Google and Yandex Safe Browsing
URL Prefix
wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3
bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set
28
Tracking Safe Browsing users
Our assumptions
bull Google and Yandex have incentives to behave maliciously
bull Wish to learn whether a user visits some selected URLs
How Google can track Safe Browsing users
bull Builds a list of prefixes to track
bull Includes these prefixes in the clientrsquos local cache
bull Learns from the requests whether a user visited a specific URL
bull Key parameter Anonymity-set size
29
Estimating anonymity-set size
bull Anonymity-set size of a prefix URLs that yield the prefix
Year URLs Domains
2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million
bull Estimate anonymity-set size Apply balls-into-bins model
Avg for URLs Avg for Domains
Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast
0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large
30
Sending multiple prefixes
Example with two prefixes
Decomposition Prefix
petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
Intuitively
bull Prefix for petsymposiumorg is not enough for re-identification
bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix
bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification
31
Ambiguity on two prefixes
bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache
URL Decomposition Prefix
Target URL abcabc A
bc B
Ambiguity
Type I gabcgabc C
abc A
bc B
Type II gbcgbc A larr collision on 32 bits
bc B
Type III defdef A larr collision on 32 bits
ef B larr collision on 32 bits
bull P[Type III] = 1264
bull Type II URLs exist only when decomp on the domain gt 232
bull P[Type I] gt P[Type II] gt P[Type III]
bull Mainly only Type I URLs create ambiguity in re-identification 32
How to track a given URL A real-world example (I)
petsymposiumorg2015
petsymposiumorg
petsymposiumorg2016linksphp
petsymposiumorg2016cfpphp
petsymposiumorg2016faqsphp
Type I URLs for target URL
petsymposiumorg2016
Goal Identify users interested in PETs
bull Target URL is petsymposiumorg2016
33
How to track a given URL A real-world example (II)
bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp
Decomposition Prefix
petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
bull Including 2 prefixes in the local cache rArr Anonymity set size of 4
bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes
bull Server receives 2 prefixes rArr visited page is the target URL
bull Server receives 3 prefixes rArr visited page is either of the leaf URLs
bull The third prefix decides which leaf URL was visited
bull Generalizable to any number of prefixes 34
Examples of URLs creating multiple hits
bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing
URL matching decomposition
Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf
17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp
httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues
1001cartesorgtag
Yandex
httpfrxhamstercomuservideofrxhamstercomxhamstercom
httpnlxhamstercomuservideonlxhamstercomxhamstercom
httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom
httpmmofoscomuserloginmmofoscommofoscom
httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom
teenslovehugecockscom
bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom
bull No need to add additional prefix for French or Dutch version35
Responsible disclosure and impact
bull Disclosure to Mozilla Firefox
ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox
bull Disclosure to Yandex
ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team
bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)
ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy
36
Conclusions amp Future Work
Conclusions
Lesson learnt Collisions are hard to tame in security and privacy
Bloom filters
bull Developers tend to ignore the worst-case of algorithms
bull Data structures with ad-hoc crypto primitives are at the best risky
bull Need of secure instantiations eg as in Count-Min sketches
Safe Browsing
bull Re-establish the weakness of anonymity-set privacy model
bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so
38
Future work
Bloom filters
bull On Bloom filters Bloom paradox [Rottenstreich 2015]
bull Beyond Bloom filters Security of Bloom filter variants
Safe Browsing
bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]
bull Privacy Can local cache improve Private Information Retrieval
39
Thank you
40
Other works not covered today
bull Performance of cryptographic accumulators
bull Private password auditing
bull (In)Security of Google and Yandex Safe Browsing
bull Alerting websites Risks and solutions
bull Decompression quines and anti-viruses
bull Pitfalls of hashing for privacy
bull Linkable (zero-knowledge) proofs for private and accountable gossip
41
Google Safe Browsing in Mozilla Firefox
18
And many others
19
Adverted privacy policy
ldquoWe collect visited web pages clickstream data or web addressaccessed browser identifier and user IDrdquo mdash WOT
ldquocollects information including IP address the origin of thesearch and may share this info with a third partyrdquo mdash Norton
Many Safe Browsing services are privacy unfriendly by design
ldquocannot determine the real URL from the informationreceivedrdquo mdash Google
bull Google seems to provide the most private service
bull Hence focus of this work
20
Google Safe Browsing When Why and How
bull When In 2008 by Google
bull Goals Protect frombull Phishing sitesbull Malware sites
bull How Easy-to-use APIs in C Python and PHP
bull Methodology Blacklists
bull Available in
bull Impactbull Billions of usersbull Detects thousands of new malicious websites per day
bull Cloned by Yandex as Yandex Safe Browsing 21
Lookup API
bull Google harvests phishing and malware URLs to feed a blacklist
bull Client checks the status using a simple HTTP GETPOST requestsb-sslgooglecomsafebrowsingapilookupexamplecom
Issues
bull Does not scale Heavy network traffic
bull Privacy URLs are sent in clear
22
Improving privacy using a local cache
Database
Server ClientLocal cache
(3) Conditionalquery
(4) Answer
(1) Query
(2) Answer
Extended client
bull Communication with the server is reduced
bull Better privacy
23
Google Safe Browsing API (v3) Local cache
bull BlacklistsList name Description Entries
goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1
bull Does not handle URLs directly instead their SHA-256 digests
wwwevilcom SHA-256 cc7af8a31918
cc7af8a31918
32-bit prefix
bull Local cache contains prefixes 24
Clientrsquos behavior chart
Userrsquos input
URLCanonicalize andcompute digest
Foundprefix
Get fulldigests
Founddigest
MaliciousURL
Non-maliciousURL
yes
no
yes
no
25
Canonicalization and decompositions
bull Input URLhttpusrpwdabcport12extparam=1frags
bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password
bull Multiple decompositions are checked for a single URL
Decompositions of canonicalized URL
1 abc12extparam=12 abc12ext3 abc14 abc
5 bc12extparam=16 bc12ext7 bc18 bc
bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious
26
Purpose of computing decompositions
Memory saving
bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix
A more intricate example
Unsafe link Safe link
dbcebc
bcabc2
abc1abc
bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc
27
Privacy of Google Safe Browsing
ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy
Our goal A privacy analysis of Google and Yandex Safe Browsing
URL Prefix
wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3
bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set
28
Tracking Safe Browsing users
Our assumptions
bull Google and Yandex have incentives to behave maliciously
bull Wish to learn whether a user visits some selected URLs
How Google can track Safe Browsing users
bull Builds a list of prefixes to track
bull Includes these prefixes in the clientrsquos local cache
bull Learns from the requests whether a user visited a specific URL
bull Key parameter Anonymity-set size
29
Estimating anonymity-set size
bull Anonymity-set size of a prefix URLs that yield the prefix
Year URLs Domains
2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million
bull Estimate anonymity-set size Apply balls-into-bins model
Avg for URLs Avg for Domains
Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast
0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large
30
Sending multiple prefixes
Example with two prefixes
Decomposition Prefix
petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
Intuitively
bull Prefix for petsymposiumorg is not enough for re-identification
bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix
bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification
31
Ambiguity on two prefixes
bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache
URL Decomposition Prefix
Target URL abcabc A
bc B
Ambiguity
Type I gabcgabc C
abc A
bc B
Type II gbcgbc A larr collision on 32 bits
bc B
Type III defdef A larr collision on 32 bits
ef B larr collision on 32 bits
bull P[Type III] = 1264
bull Type II URLs exist only when decomp on the domain gt 232
bull P[Type I] gt P[Type II] gt P[Type III]
bull Mainly only Type I URLs create ambiguity in re-identification 32
How to track a given URL A real-world example (I)
petsymposiumorg2015
petsymposiumorg
petsymposiumorg2016linksphp
petsymposiumorg2016cfpphp
petsymposiumorg2016faqsphp
Type I URLs for target URL
petsymposiumorg2016
Goal Identify users interested in PETs
bull Target URL is petsymposiumorg2016
33
How to track a given URL A real-world example (II)
bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp
Decomposition Prefix
petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
bull Including 2 prefixes in the local cache rArr Anonymity set size of 4
bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes
bull Server receives 2 prefixes rArr visited page is the target URL
bull Server receives 3 prefixes rArr visited page is either of the leaf URLs
bull The third prefix decides which leaf URL was visited
bull Generalizable to any number of prefixes 34
Examples of URLs creating multiple hits
bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing
URL matching decomposition
Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf
17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp
httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues
1001cartesorgtag
Yandex
httpfrxhamstercomuservideofrxhamstercomxhamstercom
httpnlxhamstercomuservideonlxhamstercomxhamstercom
httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom
httpmmofoscomuserloginmmofoscommofoscom
httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom
teenslovehugecockscom
bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom
bull No need to add additional prefix for French or Dutch version35
Responsible disclosure and impact
bull Disclosure to Mozilla Firefox
ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox
bull Disclosure to Yandex
ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team
bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)
ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy
36
Conclusions amp Future Work
Conclusions
Lesson learnt Collisions are hard to tame in security and privacy
Bloom filters
bull Developers tend to ignore the worst-case of algorithms
bull Data structures with ad-hoc crypto primitives are at the best risky
bull Need of secure instantiations eg as in Count-Min sketches
Safe Browsing
bull Re-establish the weakness of anonymity-set privacy model
bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so
38
Future work
Bloom filters
bull On Bloom filters Bloom paradox [Rottenstreich 2015]
bull Beyond Bloom filters Security of Bloom filter variants
Safe Browsing
bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]
bull Privacy Can local cache improve Private Information Retrieval
39
Thank you
40
Other works not covered today
bull Performance of cryptographic accumulators
bull Private password auditing
bull (In)Security of Google and Yandex Safe Browsing
bull Alerting websites Risks and solutions
bull Decompression quines and anti-viruses
bull Pitfalls of hashing for privacy
bull Linkable (zero-knowledge) proofs for private and accountable gossip
41
And many others
19
Adverted privacy policy
ldquoWe collect visited web pages clickstream data or web addressaccessed browser identifier and user IDrdquo mdash WOT
ldquocollects information including IP address the origin of thesearch and may share this info with a third partyrdquo mdash Norton
Many Safe Browsing services are privacy unfriendly by design
ldquocannot determine the real URL from the informationreceivedrdquo mdash Google
bull Google seems to provide the most private service
bull Hence focus of this work
20
Google Safe Browsing When Why and How
bull When In 2008 by Google
bull Goals Protect frombull Phishing sitesbull Malware sites
bull How Easy-to-use APIs in C Python and PHP
bull Methodology Blacklists
bull Available in
bull Impactbull Billions of usersbull Detects thousands of new malicious websites per day
bull Cloned by Yandex as Yandex Safe Browsing 21
Lookup API
bull Google harvests phishing and malware URLs to feed a blacklist
bull Client checks the status using a simple HTTP GETPOST requestsb-sslgooglecomsafebrowsingapilookupexamplecom
Issues
bull Does not scale Heavy network traffic
bull Privacy URLs are sent in clear
22
Improving privacy using a local cache
Database
Server ClientLocal cache
(3) Conditionalquery
(4) Answer
(1) Query
(2) Answer
Extended client
bull Communication with the server is reduced
bull Better privacy
23
Google Safe Browsing API (v3) Local cache
bull BlacklistsList name Description Entries
goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1
bull Does not handle URLs directly instead their SHA-256 digests
wwwevilcom SHA-256 cc7af8a31918
cc7af8a31918
32-bit prefix
bull Local cache contains prefixes 24
Clientrsquos behavior chart
Userrsquos input
URLCanonicalize andcompute digest
Foundprefix
Get fulldigests
Founddigest
MaliciousURL
Non-maliciousURL
yes
no
yes
no
25
Canonicalization and decompositions
bull Input URLhttpusrpwdabcport12extparam=1frags
bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password
bull Multiple decompositions are checked for a single URL
Decompositions of canonicalized URL
1 abc12extparam=12 abc12ext3 abc14 abc
5 bc12extparam=16 bc12ext7 bc18 bc
bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious
26
Purpose of computing decompositions
Memory saving
bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix
A more intricate example
Unsafe link Safe link
dbcebc
bcabc2
abc1abc
bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc
27
Privacy of Google Safe Browsing
ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy
Our goal A privacy analysis of Google and Yandex Safe Browsing
URL Prefix
wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3
bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set
28
Tracking Safe Browsing users
Our assumptions
bull Google and Yandex have incentives to behave maliciously
bull Wish to learn whether a user visits some selected URLs
How Google can track Safe Browsing users
bull Builds a list of prefixes to track
bull Includes these prefixes in the clientrsquos local cache
bull Learns from the requests whether a user visited a specific URL
bull Key parameter Anonymity-set size
29
Estimating anonymity-set size
bull Anonymity-set size of a prefix URLs that yield the prefix
Year URLs Domains
2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million
bull Estimate anonymity-set size Apply balls-into-bins model
Avg for URLs Avg for Domains
Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast
0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large
30
Sending multiple prefixes
Example with two prefixes
Decomposition Prefix
petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
Intuitively
bull Prefix for petsymposiumorg is not enough for re-identification
bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix
bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification
31
Ambiguity on two prefixes
bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache
URL Decomposition Prefix
Target URL abcabc A
bc B
Ambiguity
Type I gabcgabc C
abc A
bc B
Type II gbcgbc A larr collision on 32 bits
bc B
Type III defdef A larr collision on 32 bits
ef B larr collision on 32 bits
bull P[Type III] = 1264
bull Type II URLs exist only when decomp on the domain gt 232
bull P[Type I] gt P[Type II] gt P[Type III]
bull Mainly only Type I URLs create ambiguity in re-identification 32
How to track a given URL A real-world example (I)
petsymposiumorg2015
petsymposiumorg
petsymposiumorg2016linksphp
petsymposiumorg2016cfpphp
petsymposiumorg2016faqsphp
Type I URLs for target URL
petsymposiumorg2016
Goal Identify users interested in PETs
bull Target URL is petsymposiumorg2016
33
How to track a given URL A real-world example (II)
bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp
Decomposition Prefix
petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
bull Including 2 prefixes in the local cache rArr Anonymity set size of 4
bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes
bull Server receives 2 prefixes rArr visited page is the target URL
bull Server receives 3 prefixes rArr visited page is either of the leaf URLs
bull The third prefix decides which leaf URL was visited
bull Generalizable to any number of prefixes 34
Examples of URLs creating multiple hits
bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing
URL matching decomposition
Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf
17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp
httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues
1001cartesorgtag
Yandex
httpfrxhamstercomuservideofrxhamstercomxhamstercom
httpnlxhamstercomuservideonlxhamstercomxhamstercom
httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom
httpmmofoscomuserloginmmofoscommofoscom
httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom
teenslovehugecockscom
bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom
bull No need to add additional prefix for French or Dutch version35
Responsible disclosure and impact
bull Disclosure to Mozilla Firefox
ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox
bull Disclosure to Yandex
ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team
bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)
ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy
36
Conclusions amp Future Work
Conclusions
Lesson learnt Collisions are hard to tame in security and privacy
Bloom filters
bull Developers tend to ignore the worst-case of algorithms
bull Data structures with ad-hoc crypto primitives are at the best risky
bull Need of secure instantiations eg as in Count-Min sketches
Safe Browsing
bull Re-establish the weakness of anonymity-set privacy model
bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so
38
Future work
Bloom filters
bull On Bloom filters Bloom paradox [Rottenstreich 2015]
bull Beyond Bloom filters Security of Bloom filter variants
Safe Browsing
bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]
bull Privacy Can local cache improve Private Information Retrieval
39
Thank you
40
Other works not covered today
bull Performance of cryptographic accumulators
bull Private password auditing
bull (In)Security of Google and Yandex Safe Browsing
bull Alerting websites Risks and solutions
bull Decompression quines and anti-viruses
bull Pitfalls of hashing for privacy
bull Linkable (zero-knowledge) proofs for private and accountable gossip
41
Adverted privacy policy
ldquoWe collect visited web pages clickstream data or web addressaccessed browser identifier and user IDrdquo mdash WOT
ldquocollects information including IP address the origin of thesearch and may share this info with a third partyrdquo mdash Norton
Many Safe Browsing services are privacy unfriendly by design
ldquocannot determine the real URL from the informationreceivedrdquo mdash Google
bull Google seems to provide the most private service
bull Hence focus of this work
20
Google Safe Browsing When Why and How
bull When In 2008 by Google
bull Goals Protect frombull Phishing sitesbull Malware sites
bull How Easy-to-use APIs in C Python and PHP
bull Methodology Blacklists
bull Available in
bull Impactbull Billions of usersbull Detects thousands of new malicious websites per day
bull Cloned by Yandex as Yandex Safe Browsing 21
Lookup API
bull Google harvests phishing and malware URLs to feed a blacklist
bull Client checks the status using a simple HTTP GETPOST requestsb-sslgooglecomsafebrowsingapilookupexamplecom
Issues
bull Does not scale Heavy network traffic
bull Privacy URLs are sent in clear
22
Improving privacy using a local cache
Database
Server ClientLocal cache
(3) Conditionalquery
(4) Answer
(1) Query
(2) Answer
Extended client
bull Communication with the server is reduced
bull Better privacy
23
Google Safe Browsing API (v3) Local cache
bull BlacklistsList name Description Entries
goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1
bull Does not handle URLs directly instead their SHA-256 digests
wwwevilcom SHA-256 cc7af8a31918
cc7af8a31918
32-bit prefix
bull Local cache contains prefixes 24
Clientrsquos behavior chart
Userrsquos input
URLCanonicalize andcompute digest
Foundprefix
Get fulldigests
Founddigest
MaliciousURL
Non-maliciousURL
yes
no
yes
no
25
Canonicalization and decompositions
bull Input URLhttpusrpwdabcport12extparam=1frags
bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password
bull Multiple decompositions are checked for a single URL
Decompositions of canonicalized URL
1 abc12extparam=12 abc12ext3 abc14 abc
5 bc12extparam=16 bc12ext7 bc18 bc
bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious
26
Purpose of computing decompositions
Memory saving
bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix
A more intricate example
Unsafe link Safe link
dbcebc
bcabc2
abc1abc
bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc
27
Privacy of Google Safe Browsing
ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy
Our goal A privacy analysis of Google and Yandex Safe Browsing
URL Prefix
wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3
bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set
28
Tracking Safe Browsing users
Our assumptions
bull Google and Yandex have incentives to behave maliciously
bull Wish to learn whether a user visits some selected URLs
How Google can track Safe Browsing users
bull Builds a list of prefixes to track
bull Includes these prefixes in the clientrsquos local cache
bull Learns from the requests whether a user visited a specific URL
bull Key parameter Anonymity-set size
29
Estimating anonymity-set size
bull Anonymity-set size of a prefix URLs that yield the prefix
Year URLs Domains
2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million
bull Estimate anonymity-set size Apply balls-into-bins model
Avg for URLs Avg for Domains
Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast
0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large
30
Sending multiple prefixes
Example with two prefixes
Decomposition Prefix
petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
Intuitively
bull Prefix for petsymposiumorg is not enough for re-identification
bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix
bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification
31
Ambiguity on two prefixes
bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache
URL Decomposition Prefix
Target URL abcabc A
bc B
Ambiguity
Type I gabcgabc C
abc A
bc B
Type II gbcgbc A larr collision on 32 bits
bc B
Type III defdef A larr collision on 32 bits
ef B larr collision on 32 bits
bull P[Type III] = 1264
bull Type II URLs exist only when decomp on the domain gt 232
bull P[Type I] gt P[Type II] gt P[Type III]
bull Mainly only Type I URLs create ambiguity in re-identification 32
How to track a given URL A real-world example (I)
petsymposiumorg2015
petsymposiumorg
petsymposiumorg2016linksphp
petsymposiumorg2016cfpphp
petsymposiumorg2016faqsphp
Type I URLs for target URL
petsymposiumorg2016
Goal Identify users interested in PETs
bull Target URL is petsymposiumorg2016
33
How to track a given URL A real-world example (II)
bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp
Decomposition Prefix
petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
bull Including 2 prefixes in the local cache rArr Anonymity set size of 4
bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes
bull Server receives 2 prefixes rArr visited page is the target URL
bull Server receives 3 prefixes rArr visited page is either of the leaf URLs
bull The third prefix decides which leaf URL was visited
bull Generalizable to any number of prefixes 34
Examples of URLs creating multiple hits
bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing
URL matching decomposition
Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf
17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp
httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues
1001cartesorgtag
Yandex
httpfrxhamstercomuservideofrxhamstercomxhamstercom
httpnlxhamstercomuservideonlxhamstercomxhamstercom
httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom
httpmmofoscomuserloginmmofoscommofoscom
httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom
teenslovehugecockscom
bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom
bull No need to add additional prefix for French or Dutch version35
Responsible disclosure and impact
bull Disclosure to Mozilla Firefox
ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox
bull Disclosure to Yandex
ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team
bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)
ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy
36
Conclusions amp Future Work
Conclusions
Lesson learnt Collisions are hard to tame in security and privacy
Bloom filters
bull Developers tend to ignore the worst-case of algorithms
bull Data structures with ad-hoc crypto primitives are at the best risky
bull Need of secure instantiations eg as in Count-Min sketches
Safe Browsing
bull Re-establish the weakness of anonymity-set privacy model
bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so
38
Future work
Bloom filters
bull On Bloom filters Bloom paradox [Rottenstreich 2015]
bull Beyond Bloom filters Security of Bloom filter variants
Safe Browsing
bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]
bull Privacy Can local cache improve Private Information Retrieval
39
Thank you
40
Other works not covered today
bull Performance of cryptographic accumulators
bull Private password auditing
bull (In)Security of Google and Yandex Safe Browsing
bull Alerting websites Risks and solutions
bull Decompression quines and anti-viruses
bull Pitfalls of hashing for privacy
bull Linkable (zero-knowledge) proofs for private and accountable gossip
41
Google Safe Browsing When Why and How
bull When In 2008 by Google
bull Goals Protect frombull Phishing sitesbull Malware sites
bull How Easy-to-use APIs in C Python and PHP
bull Methodology Blacklists
bull Available in
bull Impactbull Billions of usersbull Detects thousands of new malicious websites per day
bull Cloned by Yandex as Yandex Safe Browsing 21
Lookup API
bull Google harvests phishing and malware URLs to feed a blacklist
bull Client checks the status using a simple HTTP GETPOST requestsb-sslgooglecomsafebrowsingapilookupexamplecom
Issues
bull Does not scale Heavy network traffic
bull Privacy URLs are sent in clear
22
Improving privacy using a local cache
Database
Server ClientLocal cache
(3) Conditionalquery
(4) Answer
(1) Query
(2) Answer
Extended client
bull Communication with the server is reduced
bull Better privacy
23
Google Safe Browsing API (v3) Local cache
bull BlacklistsList name Description Entries
goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1
bull Does not handle URLs directly instead their SHA-256 digests
wwwevilcom SHA-256 cc7af8a31918
cc7af8a31918
32-bit prefix
bull Local cache contains prefixes 24
Clientrsquos behavior chart
Userrsquos input
URLCanonicalize andcompute digest
Foundprefix
Get fulldigests
Founddigest
MaliciousURL
Non-maliciousURL
yes
no
yes
no
25
Canonicalization and decompositions
bull Input URLhttpusrpwdabcport12extparam=1frags
bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password
bull Multiple decompositions are checked for a single URL
Decompositions of canonicalized URL
1 abc12extparam=12 abc12ext3 abc14 abc
5 bc12extparam=16 bc12ext7 bc18 bc
bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious
26
Purpose of computing decompositions
Memory saving
bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix
A more intricate example
Unsafe link Safe link
dbcebc
bcabc2
abc1abc
bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc
27
Privacy of Google Safe Browsing
ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy
Our goal A privacy analysis of Google and Yandex Safe Browsing
URL Prefix
wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3
bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set
28
Tracking Safe Browsing users
Our assumptions
bull Google and Yandex have incentives to behave maliciously
bull Wish to learn whether a user visits some selected URLs
How Google can track Safe Browsing users
bull Builds a list of prefixes to track
bull Includes these prefixes in the clientrsquos local cache
bull Learns from the requests whether a user visited a specific URL
bull Key parameter Anonymity-set size
29
Estimating anonymity-set size
bull Anonymity-set size of a prefix URLs that yield the prefix
Year URLs Domains
2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million
bull Estimate anonymity-set size Apply balls-into-bins model
Avg for URLs Avg for Domains
Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast
0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large
30
Sending multiple prefixes
Example with two prefixes
Decomposition Prefix
petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
Intuitively
bull Prefix for petsymposiumorg is not enough for re-identification
bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix
bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification
31
Ambiguity on two prefixes
bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache
URL Decomposition Prefix
Target URL abcabc A
bc B
Ambiguity
Type I gabcgabc C
abc A
bc B
Type II gbcgbc A larr collision on 32 bits
bc B
Type III defdef A larr collision on 32 bits
ef B larr collision on 32 bits
bull P[Type III] = 1264
bull Type II URLs exist only when decomp on the domain gt 232
bull P[Type I] gt P[Type II] gt P[Type III]
bull Mainly only Type I URLs create ambiguity in re-identification 32
How to track a given URL A real-world example (I)
petsymposiumorg2015
petsymposiumorg
petsymposiumorg2016linksphp
petsymposiumorg2016cfpphp
petsymposiumorg2016faqsphp
Type I URLs for target URL
petsymposiumorg2016
Goal Identify users interested in PETs
bull Target URL is petsymposiumorg2016
33
How to track a given URL A real-world example (II)
bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp
Decomposition Prefix
petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
bull Including 2 prefixes in the local cache rArr Anonymity set size of 4
bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes
bull Server receives 2 prefixes rArr visited page is the target URL
bull Server receives 3 prefixes rArr visited page is either of the leaf URLs
bull The third prefix decides which leaf URL was visited
bull Generalizable to any number of prefixes 34
Examples of URLs creating multiple hits
bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing
URL matching decomposition
Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf
17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp
httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues
1001cartesorgtag
Yandex
httpfrxhamstercomuservideofrxhamstercomxhamstercom
httpnlxhamstercomuservideonlxhamstercomxhamstercom
httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom
httpmmofoscomuserloginmmofoscommofoscom
httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom
teenslovehugecockscom
bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom
bull No need to add additional prefix for French or Dutch version35
Responsible disclosure and impact
bull Disclosure to Mozilla Firefox
ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox
bull Disclosure to Yandex
ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team
bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)
ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy
36
Conclusions amp Future Work
Conclusions
Lesson learnt Collisions are hard to tame in security and privacy
Bloom filters
bull Developers tend to ignore the worst-case of algorithms
bull Data structures with ad-hoc crypto primitives are at the best risky
bull Need of secure instantiations eg as in Count-Min sketches
Safe Browsing
bull Re-establish the weakness of anonymity-set privacy model
bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so
38
Future work
Bloom filters
bull On Bloom filters Bloom paradox [Rottenstreich 2015]
bull Beyond Bloom filters Security of Bloom filter variants
Safe Browsing
bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]
bull Privacy Can local cache improve Private Information Retrieval
39
Thank you
40
Other works not covered today
bull Performance of cryptographic accumulators
bull Private password auditing
bull (In)Security of Google and Yandex Safe Browsing
bull Alerting websites Risks and solutions
bull Decompression quines and anti-viruses
bull Pitfalls of hashing for privacy
bull Linkable (zero-knowledge) proofs for private and accountable gossip
41
Lookup API
bull Google harvests phishing and malware URLs to feed a blacklist
bull Client checks the status using a simple HTTP GETPOST requestsb-sslgooglecomsafebrowsingapilookupexamplecom
Issues
bull Does not scale Heavy network traffic
bull Privacy URLs are sent in clear
22
Improving privacy using a local cache
Database
Server ClientLocal cache
(3) Conditionalquery
(4) Answer
(1) Query
(2) Answer
Extended client
bull Communication with the server is reduced
bull Better privacy
23
Google Safe Browsing API (v3) Local cache
bull BlacklistsList name Description Entries
goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1
bull Does not handle URLs directly instead their SHA-256 digests
wwwevilcom SHA-256 cc7af8a31918
cc7af8a31918
32-bit prefix
bull Local cache contains prefixes 24
Clientrsquos behavior chart
Userrsquos input
URLCanonicalize andcompute digest
Foundprefix
Get fulldigests
Founddigest
MaliciousURL
Non-maliciousURL
yes
no
yes
no
25
Canonicalization and decompositions
bull Input URLhttpusrpwdabcport12extparam=1frags
bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password
bull Multiple decompositions are checked for a single URL
Decompositions of canonicalized URL
1 abc12extparam=12 abc12ext3 abc14 abc
5 bc12extparam=16 bc12ext7 bc18 bc
bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious
26
Purpose of computing decompositions
Memory saving
bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix
A more intricate example
Unsafe link Safe link
dbcebc
bcabc2
abc1abc
bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc
27
Privacy of Google Safe Browsing
ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy
Our goal A privacy analysis of Google and Yandex Safe Browsing
URL Prefix
wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3
bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set
28
Tracking Safe Browsing users
Our assumptions
bull Google and Yandex have incentives to behave maliciously
bull Wish to learn whether a user visits some selected URLs
How Google can track Safe Browsing users
bull Builds a list of prefixes to track
bull Includes these prefixes in the clientrsquos local cache
bull Learns from the requests whether a user visited a specific URL
bull Key parameter Anonymity-set size
29
Estimating anonymity-set size
bull Anonymity-set size of a prefix URLs that yield the prefix
Year URLs Domains
2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million
bull Estimate anonymity-set size Apply balls-into-bins model
Avg for URLs Avg for Domains
Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast
0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large
30
Sending multiple prefixes
Example with two prefixes
Decomposition Prefix
petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
Intuitively
bull Prefix for petsymposiumorg is not enough for re-identification
bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix
bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification
31
Ambiguity on two prefixes
bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache
URL Decomposition Prefix
Target URL abcabc A
bc B
Ambiguity
Type I gabcgabc C
abc A
bc B
Type II gbcgbc A larr collision on 32 bits
bc B
Type III defdef A larr collision on 32 bits
ef B larr collision on 32 bits
bull P[Type III] = 1264
bull Type II URLs exist only when decomp on the domain gt 232
bull P[Type I] gt P[Type II] gt P[Type III]
bull Mainly only Type I URLs create ambiguity in re-identification 32
How to track a given URL A real-world example (I)
petsymposiumorg2015
petsymposiumorg
petsymposiumorg2016linksphp
petsymposiumorg2016cfpphp
petsymposiumorg2016faqsphp
Type I URLs for target URL
petsymposiumorg2016
Goal Identify users interested in PETs
bull Target URL is petsymposiumorg2016
33
How to track a given URL A real-world example (II)
bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp
Decomposition Prefix
petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
bull Including 2 prefixes in the local cache rArr Anonymity set size of 4
bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes
bull Server receives 2 prefixes rArr visited page is the target URL
bull Server receives 3 prefixes rArr visited page is either of the leaf URLs
bull The third prefix decides which leaf URL was visited
bull Generalizable to any number of prefixes 34
Examples of URLs creating multiple hits
bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing
URL matching decomposition
Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf
17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp
httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues
1001cartesorgtag
Yandex
httpfrxhamstercomuservideofrxhamstercomxhamstercom
httpnlxhamstercomuservideonlxhamstercomxhamstercom
httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom
httpmmofoscomuserloginmmofoscommofoscom
httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom
teenslovehugecockscom
bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom
bull No need to add additional prefix for French or Dutch version35
Responsible disclosure and impact
bull Disclosure to Mozilla Firefox
ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox
bull Disclosure to Yandex
ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team
bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)
ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy
36
Conclusions amp Future Work
Conclusions
Lesson learnt Collisions are hard to tame in security and privacy
Bloom filters
bull Developers tend to ignore the worst-case of algorithms
bull Data structures with ad-hoc crypto primitives are at the best risky
bull Need of secure instantiations eg as in Count-Min sketches
Safe Browsing
bull Re-establish the weakness of anonymity-set privacy model
bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so
38
Future work
Bloom filters
bull On Bloom filters Bloom paradox [Rottenstreich 2015]
bull Beyond Bloom filters Security of Bloom filter variants
Safe Browsing
bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]
bull Privacy Can local cache improve Private Information Retrieval
39
Thank you
40
Other works not covered today
bull Performance of cryptographic accumulators
bull Private password auditing
bull (In)Security of Google and Yandex Safe Browsing
bull Alerting websites Risks and solutions
bull Decompression quines and anti-viruses
bull Pitfalls of hashing for privacy
bull Linkable (zero-knowledge) proofs for private and accountable gossip
41
Improving privacy using a local cache
Database
Server ClientLocal cache
(3) Conditionalquery
(4) Answer
(1) Query
(2) Answer
Extended client
bull Communication with the server is reduced
bull Better privacy
23
Google Safe Browsing API (v3) Local cache
bull BlacklistsList name Description Entries
goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1
bull Does not handle URLs directly instead their SHA-256 digests
wwwevilcom SHA-256 cc7af8a31918
cc7af8a31918
32-bit prefix
bull Local cache contains prefixes 24
Clientrsquos behavior chart
Userrsquos input
URLCanonicalize andcompute digest
Foundprefix
Get fulldigests
Founddigest
MaliciousURL
Non-maliciousURL
yes
no
yes
no
25
Canonicalization and decompositions
bull Input URLhttpusrpwdabcport12extparam=1frags
bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password
bull Multiple decompositions are checked for a single URL
Decompositions of canonicalized URL
1 abc12extparam=12 abc12ext3 abc14 abc
5 bc12extparam=16 bc12ext7 bc18 bc
bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious
26
Purpose of computing decompositions
Memory saving
bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix
A more intricate example
Unsafe link Safe link
dbcebc
bcabc2
abc1abc
bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc
27
Privacy of Google Safe Browsing
ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy
Our goal A privacy analysis of Google and Yandex Safe Browsing
URL Prefix
wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3
bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set
28
Tracking Safe Browsing users
Our assumptions
bull Google and Yandex have incentives to behave maliciously
bull Wish to learn whether a user visits some selected URLs
How Google can track Safe Browsing users
bull Builds a list of prefixes to track
bull Includes these prefixes in the clientrsquos local cache
bull Learns from the requests whether a user visited a specific URL
bull Key parameter Anonymity-set size
29
Estimating anonymity-set size
bull Anonymity-set size of a prefix URLs that yield the prefix
Year URLs Domains
2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million
bull Estimate anonymity-set size Apply balls-into-bins model
Avg for URLs Avg for Domains
Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast
0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large
30
Sending multiple prefixes
Example with two prefixes
Decomposition Prefix
petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
Intuitively
bull Prefix for petsymposiumorg is not enough for re-identification
bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix
bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification
31
Ambiguity on two prefixes
bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache
URL Decomposition Prefix
Target URL abcabc A
bc B
Ambiguity
Type I gabcgabc C
abc A
bc B
Type II gbcgbc A larr collision on 32 bits
bc B
Type III defdef A larr collision on 32 bits
ef B larr collision on 32 bits
bull P[Type III] = 1264
bull Type II URLs exist only when decomp on the domain gt 232
bull P[Type I] gt P[Type II] gt P[Type III]
bull Mainly only Type I URLs create ambiguity in re-identification 32
How to track a given URL A real-world example (I)
petsymposiumorg2015
petsymposiumorg
petsymposiumorg2016linksphp
petsymposiumorg2016cfpphp
petsymposiumorg2016faqsphp
Type I URLs for target URL
petsymposiumorg2016
Goal Identify users interested in PETs
bull Target URL is petsymposiumorg2016
33
How to track a given URL A real-world example (II)
bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp
Decomposition Prefix
petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
bull Including 2 prefixes in the local cache rArr Anonymity set size of 4
bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes
bull Server receives 2 prefixes rArr visited page is the target URL
bull Server receives 3 prefixes rArr visited page is either of the leaf URLs
bull The third prefix decides which leaf URL was visited
bull Generalizable to any number of prefixes 34
Examples of URLs creating multiple hits
bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing
URL matching decomposition
Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf
17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp
httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues
1001cartesorgtag
Yandex
httpfrxhamstercomuservideofrxhamstercomxhamstercom
httpnlxhamstercomuservideonlxhamstercomxhamstercom
httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom
httpmmofoscomuserloginmmofoscommofoscom
httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom
teenslovehugecockscom
bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom
bull No need to add additional prefix for French or Dutch version35
Responsible disclosure and impact
bull Disclosure to Mozilla Firefox
ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox
bull Disclosure to Yandex
ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team
bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)
ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy
36
Conclusions amp Future Work
Conclusions
Lesson learnt Collisions are hard to tame in security and privacy
Bloom filters
bull Developers tend to ignore the worst-case of algorithms
bull Data structures with ad-hoc crypto primitives are at the best risky
bull Need of secure instantiations eg as in Count-Min sketches
Safe Browsing
bull Re-establish the weakness of anonymity-set privacy model
bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so
38
Future work
Bloom filters
bull On Bloom filters Bloom paradox [Rottenstreich 2015]
bull Beyond Bloom filters Security of Bloom filter variants
Safe Browsing
bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]
bull Privacy Can local cache improve Private Information Retrieval
39
Thank you
40
Other works not covered today
bull Performance of cryptographic accumulators
bull Private password auditing
bull (In)Security of Google and Yandex Safe Browsing
bull Alerting websites Risks and solutions
bull Decompression quines and anti-viruses
bull Pitfalls of hashing for privacy
bull Linkable (zero-knowledge) proofs for private and accountable gossip
41
Google Safe Browsing API (v3) Local cache
bull BlacklistsList name Description Entries
goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1
bull Does not handle URLs directly instead their SHA-256 digests
wwwevilcom SHA-256 cc7af8a31918
cc7af8a31918
32-bit prefix
bull Local cache contains prefixes 24
Clientrsquos behavior chart
Userrsquos input
URLCanonicalize andcompute digest
Foundprefix
Get fulldigests
Founddigest
MaliciousURL
Non-maliciousURL
yes
no
yes
no
25
Canonicalization and decompositions
bull Input URLhttpusrpwdabcport12extparam=1frags
bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password
bull Multiple decompositions are checked for a single URL
Decompositions of canonicalized URL
1 abc12extparam=12 abc12ext3 abc14 abc
5 bc12extparam=16 bc12ext7 bc18 bc
bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious
26
Purpose of computing decompositions
Memory saving
bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix
A more intricate example
Unsafe link Safe link
dbcebc
bcabc2
abc1abc
bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc
27
Privacy of Google Safe Browsing
ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy
Our goal A privacy analysis of Google and Yandex Safe Browsing
URL Prefix
wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3
bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set
28
Tracking Safe Browsing users
Our assumptions
bull Google and Yandex have incentives to behave maliciously
bull Wish to learn whether a user visits some selected URLs
How Google can track Safe Browsing users
bull Builds a list of prefixes to track
bull Includes these prefixes in the clientrsquos local cache
bull Learns from the requests whether a user visited a specific URL
bull Key parameter Anonymity-set size
29
Estimating anonymity-set size
bull Anonymity-set size of a prefix URLs that yield the prefix
Year URLs Domains
2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million
bull Estimate anonymity-set size Apply balls-into-bins model
Avg for URLs Avg for Domains
Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast
0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large
30
Sending multiple prefixes
Example with two prefixes
Decomposition Prefix
petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
Intuitively
bull Prefix for petsymposiumorg is not enough for re-identification
bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix
bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification
31
Ambiguity on two prefixes
bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache
URL Decomposition Prefix
Target URL abcabc A
bc B
Ambiguity
Type I gabcgabc C
abc A
bc B
Type II gbcgbc A larr collision on 32 bits
bc B
Type III defdef A larr collision on 32 bits
ef B larr collision on 32 bits
bull P[Type III] = 1264
bull Type II URLs exist only when decomp on the domain gt 232
bull P[Type I] gt P[Type II] gt P[Type III]
bull Mainly only Type I URLs create ambiguity in re-identification 32
How to track a given URL A real-world example (I)
petsymposiumorg2015
petsymposiumorg
petsymposiumorg2016linksphp
petsymposiumorg2016cfpphp
petsymposiumorg2016faqsphp
Type I URLs for target URL
petsymposiumorg2016
Goal Identify users interested in PETs
bull Target URL is petsymposiumorg2016
33
How to track a given URL A real-world example (II)
bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp
Decomposition Prefix
petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
bull Including 2 prefixes in the local cache rArr Anonymity set size of 4
bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes
bull Server receives 2 prefixes rArr visited page is the target URL
bull Server receives 3 prefixes rArr visited page is either of the leaf URLs
bull The third prefix decides which leaf URL was visited
bull Generalizable to any number of prefixes 34
Examples of URLs creating multiple hits
bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing
URL matching decomposition
Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf
17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp
httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues
1001cartesorgtag
Yandex
httpfrxhamstercomuservideofrxhamstercomxhamstercom
httpnlxhamstercomuservideonlxhamstercomxhamstercom
httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom
httpmmofoscomuserloginmmofoscommofoscom
httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom
teenslovehugecockscom
bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom
bull No need to add additional prefix for French or Dutch version35
Responsible disclosure and impact
bull Disclosure to Mozilla Firefox
ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox
bull Disclosure to Yandex
ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team
bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)
ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy
36
Conclusions amp Future Work
Conclusions
Lesson learnt Collisions are hard to tame in security and privacy
Bloom filters
bull Developers tend to ignore the worst-case of algorithms
bull Data structures with ad-hoc crypto primitives are at the best risky
bull Need of secure instantiations eg as in Count-Min sketches
Safe Browsing
bull Re-establish the weakness of anonymity-set privacy model
bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so
38
Future work
Bloom filters
bull On Bloom filters Bloom paradox [Rottenstreich 2015]
bull Beyond Bloom filters Security of Bloom filter variants
Safe Browsing
bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]
bull Privacy Can local cache improve Private Information Retrieval
39
Thank you
40
Other works not covered today
bull Performance of cryptographic accumulators
bull Private password auditing
bull (In)Security of Google and Yandex Safe Browsing
bull Alerting websites Risks and solutions
bull Decompression quines and anti-viruses
bull Pitfalls of hashing for privacy
bull Linkable (zero-knowledge) proofs for private and accountable gossip
41
Clientrsquos behavior chart
Userrsquos input
URLCanonicalize andcompute digest
Foundprefix
Get fulldigests
Founddigest
MaliciousURL
Non-maliciousURL
yes
no
yes
no
25
Canonicalization and decompositions
bull Input URLhttpusrpwdabcport12extparam=1frags
bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password
bull Multiple decompositions are checked for a single URL
Decompositions of canonicalized URL
1 abc12extparam=12 abc12ext3 abc14 abc
5 bc12extparam=16 bc12ext7 bc18 bc
bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious
26
Purpose of computing decompositions
Memory saving
bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix
A more intricate example
Unsafe link Safe link
dbcebc
bcabc2
abc1abc
bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc
27
Privacy of Google Safe Browsing
ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy
Our goal A privacy analysis of Google and Yandex Safe Browsing
URL Prefix
wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3
bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set
28
Tracking Safe Browsing users
Our assumptions
bull Google and Yandex have incentives to behave maliciously
bull Wish to learn whether a user visits some selected URLs
How Google can track Safe Browsing users
bull Builds a list of prefixes to track
bull Includes these prefixes in the clientrsquos local cache
bull Learns from the requests whether a user visited a specific URL
bull Key parameter Anonymity-set size
29
Estimating anonymity-set size
bull Anonymity-set size of a prefix URLs that yield the prefix
Year URLs Domains
2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million
bull Estimate anonymity-set size Apply balls-into-bins model
Avg for URLs Avg for Domains
Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast
0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large
30
Sending multiple prefixes
Example with two prefixes
Decomposition Prefix
petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
Intuitively
bull Prefix for petsymposiumorg is not enough for re-identification
bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix
bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification
31
Ambiguity on two prefixes
bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache
URL Decomposition Prefix
Target URL abcabc A
bc B
Ambiguity
Type I gabcgabc C
abc A
bc B
Type II gbcgbc A larr collision on 32 bits
bc B
Type III defdef A larr collision on 32 bits
ef B larr collision on 32 bits
bull P[Type III] = 1264
bull Type II URLs exist only when decomp on the domain gt 232
bull P[Type I] gt P[Type II] gt P[Type III]
bull Mainly only Type I URLs create ambiguity in re-identification 32
How to track a given URL A real-world example (I)
petsymposiumorg2015
petsymposiumorg
petsymposiumorg2016linksphp
petsymposiumorg2016cfpphp
petsymposiumorg2016faqsphp
Type I URLs for target URL
petsymposiumorg2016
Goal Identify users interested in PETs
bull Target URL is petsymposiumorg2016
33
How to track a given URL A real-world example (II)
bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp
Decomposition Prefix
petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
bull Including 2 prefixes in the local cache rArr Anonymity set size of 4
bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes
bull Server receives 2 prefixes rArr visited page is the target URL
bull Server receives 3 prefixes rArr visited page is either of the leaf URLs
bull The third prefix decides which leaf URL was visited
bull Generalizable to any number of prefixes 34
Examples of URLs creating multiple hits
bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing
URL matching decomposition
Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf
17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp
httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues
1001cartesorgtag
Yandex
httpfrxhamstercomuservideofrxhamstercomxhamstercom
httpnlxhamstercomuservideonlxhamstercomxhamstercom
httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom
httpmmofoscomuserloginmmofoscommofoscom
httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom
teenslovehugecockscom
bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom
bull No need to add additional prefix for French or Dutch version35
Responsible disclosure and impact
bull Disclosure to Mozilla Firefox
ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox
bull Disclosure to Yandex
ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team
bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)
ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy
36
Conclusions amp Future Work
Conclusions
Lesson learnt Collisions are hard to tame in security and privacy
Bloom filters
bull Developers tend to ignore the worst-case of algorithms
bull Data structures with ad-hoc crypto primitives are at the best risky
bull Need of secure instantiations eg as in Count-Min sketches
Safe Browsing
bull Re-establish the weakness of anonymity-set privacy model
bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so
38
Future work
Bloom filters
bull On Bloom filters Bloom paradox [Rottenstreich 2015]
bull Beyond Bloom filters Security of Bloom filter variants
Safe Browsing
bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]
bull Privacy Can local cache improve Private Information Retrieval
39
Thank you
40
Other works not covered today
bull Performance of cryptographic accumulators
bull Private password auditing
bull (In)Security of Google and Yandex Safe Browsing
bull Alerting websites Risks and solutions
bull Decompression quines and anti-viruses
bull Pitfalls of hashing for privacy
bull Linkable (zero-knowledge) proofs for private and accountable gossip
41
Canonicalization and decompositions
bull Input URLhttpusrpwdabcport12extparam=1frags
bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password
bull Multiple decompositions are checked for a single URL
Decompositions of canonicalized URL
1 abc12extparam=12 abc12ext3 abc14 abc
5 bc12extparam=16 bc12ext7 bc18 bc
bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious
26
Purpose of computing decompositions
Memory saving
bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix
A more intricate example
Unsafe link Safe link
dbcebc
bcabc2
abc1abc
bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc
27
Privacy of Google Safe Browsing
ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy
Our goal A privacy analysis of Google and Yandex Safe Browsing
URL Prefix
wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3
bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set
28
Tracking Safe Browsing users
Our assumptions
bull Google and Yandex have incentives to behave maliciously
bull Wish to learn whether a user visits some selected URLs
How Google can track Safe Browsing users
bull Builds a list of prefixes to track
bull Includes these prefixes in the clientrsquos local cache
bull Learns from the requests whether a user visited a specific URL
bull Key parameter Anonymity-set size
29
Estimating anonymity-set size
bull Anonymity-set size of a prefix URLs that yield the prefix
Year URLs Domains
2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million
bull Estimate anonymity-set size Apply balls-into-bins model
Avg for URLs Avg for Domains
Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast
0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large
30
Sending multiple prefixes
Example with two prefixes
Decomposition Prefix
petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
Intuitively
bull Prefix for petsymposiumorg is not enough for re-identification
bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix
bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification
31
Ambiguity on two prefixes
bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache
URL Decomposition Prefix
Target URL abcabc A
bc B
Ambiguity
Type I gabcgabc C
abc A
bc B
Type II gbcgbc A larr collision on 32 bits
bc B
Type III defdef A larr collision on 32 bits
ef B larr collision on 32 bits
bull P[Type III] = 1264
bull Type II URLs exist only when decomp on the domain gt 232
bull P[Type I] gt P[Type II] gt P[Type III]
bull Mainly only Type I URLs create ambiguity in re-identification 32
How to track a given URL A real-world example (I)
petsymposiumorg2015
petsymposiumorg
petsymposiumorg2016linksphp
petsymposiumorg2016cfpphp
petsymposiumorg2016faqsphp
Type I URLs for target URL
petsymposiumorg2016
Goal Identify users interested in PETs
bull Target URL is petsymposiumorg2016
33
How to track a given URL A real-world example (II)
bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp
Decomposition Prefix
petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
bull Including 2 prefixes in the local cache rArr Anonymity set size of 4
bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes
bull Server receives 2 prefixes rArr visited page is the target URL
bull Server receives 3 prefixes rArr visited page is either of the leaf URLs
bull The third prefix decides which leaf URL was visited
bull Generalizable to any number of prefixes 34
Examples of URLs creating multiple hits
bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing
URL matching decomposition
Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf
17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp
httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues
1001cartesorgtag
Yandex
httpfrxhamstercomuservideofrxhamstercomxhamstercom
httpnlxhamstercomuservideonlxhamstercomxhamstercom
httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom
httpmmofoscomuserloginmmofoscommofoscom
httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom
teenslovehugecockscom
bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom
bull No need to add additional prefix for French or Dutch version35
Responsible disclosure and impact
bull Disclosure to Mozilla Firefox
ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox
bull Disclosure to Yandex
ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team
bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)
ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy
36
Conclusions amp Future Work
Conclusions
Lesson learnt Collisions are hard to tame in security and privacy
Bloom filters
bull Developers tend to ignore the worst-case of algorithms
bull Data structures with ad-hoc crypto primitives are at the best risky
bull Need of secure instantiations eg as in Count-Min sketches
Safe Browsing
bull Re-establish the weakness of anonymity-set privacy model
bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so
38
Future work
Bloom filters
bull On Bloom filters Bloom paradox [Rottenstreich 2015]
bull Beyond Bloom filters Security of Bloom filter variants
Safe Browsing
bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]
bull Privacy Can local cache improve Private Information Retrieval
39
Thank you
40
Other works not covered today
bull Performance of cryptographic accumulators
bull Private password auditing
bull (In)Security of Google and Yandex Safe Browsing
bull Alerting websites Risks and solutions
bull Decompression quines and anti-viruses
bull Pitfalls of hashing for privacy
bull Linkable (zero-knowledge) proofs for private and accountable gossip
41
Purpose of computing decompositions
Memory saving
bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix
A more intricate example
Unsafe link Safe link
dbcebc
bcabc2
abc1abc
bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc
27
Privacy of Google Safe Browsing
ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy
Our goal A privacy analysis of Google and Yandex Safe Browsing
URL Prefix
wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3
bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set
28
Tracking Safe Browsing users
Our assumptions
bull Google and Yandex have incentives to behave maliciously
bull Wish to learn whether a user visits some selected URLs
How Google can track Safe Browsing users
bull Builds a list of prefixes to track
bull Includes these prefixes in the clientrsquos local cache
bull Learns from the requests whether a user visited a specific URL
bull Key parameter Anonymity-set size
29
Estimating anonymity-set size
bull Anonymity-set size of a prefix URLs that yield the prefix
Year URLs Domains
2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million
bull Estimate anonymity-set size Apply balls-into-bins model
Avg for URLs Avg for Domains
Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast
0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large
30
Sending multiple prefixes
Example with two prefixes
Decomposition Prefix
petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
Intuitively
bull Prefix for petsymposiumorg is not enough for re-identification
bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix
bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification
31
Ambiguity on two prefixes
bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache
URL Decomposition Prefix
Target URL abcabc A
bc B
Ambiguity
Type I gabcgabc C
abc A
bc B
Type II gbcgbc A larr collision on 32 bits
bc B
Type III defdef A larr collision on 32 bits
ef B larr collision on 32 bits
bull P[Type III] = 1264
bull Type II URLs exist only when decomp on the domain gt 232
bull P[Type I] gt P[Type II] gt P[Type III]
bull Mainly only Type I URLs create ambiguity in re-identification 32
How to track a given URL A real-world example (I)
petsymposiumorg2015
petsymposiumorg
petsymposiumorg2016linksphp
petsymposiumorg2016cfpphp
petsymposiumorg2016faqsphp
Type I URLs for target URL
petsymposiumorg2016
Goal Identify users interested in PETs
bull Target URL is petsymposiumorg2016
33
How to track a given URL A real-world example (II)
bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp
Decomposition Prefix
petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
bull Including 2 prefixes in the local cache rArr Anonymity set size of 4
bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes
bull Server receives 2 prefixes rArr visited page is the target URL
bull Server receives 3 prefixes rArr visited page is either of the leaf URLs
bull The third prefix decides which leaf URL was visited
bull Generalizable to any number of prefixes 34
Examples of URLs creating multiple hits
bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing
URL matching decomposition
Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf
17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp
httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues
1001cartesorgtag
Yandex
httpfrxhamstercomuservideofrxhamstercomxhamstercom
httpnlxhamstercomuservideonlxhamstercomxhamstercom
httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom
httpmmofoscomuserloginmmofoscommofoscom
httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom
teenslovehugecockscom
bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom
bull No need to add additional prefix for French or Dutch version35
Responsible disclosure and impact
bull Disclosure to Mozilla Firefox
ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox
bull Disclosure to Yandex
ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team
bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)
ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy
36
Conclusions amp Future Work
Conclusions
Lesson learnt Collisions are hard to tame in security and privacy
Bloom filters
bull Developers tend to ignore the worst-case of algorithms
bull Data structures with ad-hoc crypto primitives are at the best risky
bull Need of secure instantiations eg as in Count-Min sketches
Safe Browsing
bull Re-establish the weakness of anonymity-set privacy model
bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so
38
Future work
Bloom filters
bull On Bloom filters Bloom paradox [Rottenstreich 2015]
bull Beyond Bloom filters Security of Bloom filter variants
Safe Browsing
bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]
bull Privacy Can local cache improve Private Information Retrieval
39
Thank you
40
Other works not covered today
bull Performance of cryptographic accumulators
bull Private password auditing
bull (In)Security of Google and Yandex Safe Browsing
bull Alerting websites Risks and solutions
bull Decompression quines and anti-viruses
bull Pitfalls of hashing for privacy
bull Linkable (zero-knowledge) proofs for private and accountable gossip
41
Privacy of Google Safe Browsing
ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy
Our goal A privacy analysis of Google and Yandex Safe Browsing
URL Prefix
wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3
bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set
28
Tracking Safe Browsing users
Our assumptions
bull Google and Yandex have incentives to behave maliciously
bull Wish to learn whether a user visits some selected URLs
How Google can track Safe Browsing users
bull Builds a list of prefixes to track
bull Includes these prefixes in the clientrsquos local cache
bull Learns from the requests whether a user visited a specific URL
bull Key parameter Anonymity-set size
29
Estimating anonymity-set size
bull Anonymity-set size of a prefix URLs that yield the prefix
Year URLs Domains
2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million
bull Estimate anonymity-set size Apply balls-into-bins model
Avg for URLs Avg for Domains
Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast
0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large
30
Sending multiple prefixes
Example with two prefixes
Decomposition Prefix
petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
Intuitively
bull Prefix for petsymposiumorg is not enough for re-identification
bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix
bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification
31
Ambiguity on two prefixes
bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache
URL Decomposition Prefix
Target URL abcabc A
bc B
Ambiguity
Type I gabcgabc C
abc A
bc B
Type II gbcgbc A larr collision on 32 bits
bc B
Type III defdef A larr collision on 32 bits
ef B larr collision on 32 bits
bull P[Type III] = 1264
bull Type II URLs exist only when decomp on the domain gt 232
bull P[Type I] gt P[Type II] gt P[Type III]
bull Mainly only Type I URLs create ambiguity in re-identification 32
How to track a given URL A real-world example (I)
petsymposiumorg2015
petsymposiumorg
petsymposiumorg2016linksphp
petsymposiumorg2016cfpphp
petsymposiumorg2016faqsphp
Type I URLs for target URL
petsymposiumorg2016
Goal Identify users interested in PETs
bull Target URL is petsymposiumorg2016
33
How to track a given URL A real-world example (II)
bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp
Decomposition Prefix
petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
bull Including 2 prefixes in the local cache rArr Anonymity set size of 4
bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes
bull Server receives 2 prefixes rArr visited page is the target URL
bull Server receives 3 prefixes rArr visited page is either of the leaf URLs
bull The third prefix decides which leaf URL was visited
bull Generalizable to any number of prefixes 34
Examples of URLs creating multiple hits
bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing
URL matching decomposition
Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf
17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp
httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues
1001cartesorgtag
Yandex
httpfrxhamstercomuservideofrxhamstercomxhamstercom
httpnlxhamstercomuservideonlxhamstercomxhamstercom
httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom
httpmmofoscomuserloginmmofoscommofoscom
httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom
teenslovehugecockscom
bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom
bull No need to add additional prefix for French or Dutch version35
Responsible disclosure and impact
bull Disclosure to Mozilla Firefox
ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox
bull Disclosure to Yandex
ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team
bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)
ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy
36
Conclusions amp Future Work
Conclusions
Lesson learnt Collisions are hard to tame in security and privacy
Bloom filters
bull Developers tend to ignore the worst-case of algorithms
bull Data structures with ad-hoc crypto primitives are at the best risky
bull Need of secure instantiations eg as in Count-Min sketches
Safe Browsing
bull Re-establish the weakness of anonymity-set privacy model
bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so
38
Future work
Bloom filters
bull On Bloom filters Bloom paradox [Rottenstreich 2015]
bull Beyond Bloom filters Security of Bloom filter variants
Safe Browsing
bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]
bull Privacy Can local cache improve Private Information Retrieval
39
Thank you
40
Other works not covered today
bull Performance of cryptographic accumulators
bull Private password auditing
bull (In)Security of Google and Yandex Safe Browsing
bull Alerting websites Risks and solutions
bull Decompression quines and anti-viruses
bull Pitfalls of hashing for privacy
bull Linkable (zero-knowledge) proofs for private and accountable gossip
41
Tracking Safe Browsing users
Our assumptions
bull Google and Yandex have incentives to behave maliciously
bull Wish to learn whether a user visits some selected URLs
How Google can track Safe Browsing users
bull Builds a list of prefixes to track
bull Includes these prefixes in the clientrsquos local cache
bull Learns from the requests whether a user visited a specific URL
bull Key parameter Anonymity-set size
29
Estimating anonymity-set size
bull Anonymity-set size of a prefix URLs that yield the prefix
Year URLs Domains
2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million
bull Estimate anonymity-set size Apply balls-into-bins model
Avg for URLs Avg for Domains
Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast
0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large
30
Sending multiple prefixes
Example with two prefixes
Decomposition Prefix
petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
Intuitively
bull Prefix for petsymposiumorg is not enough for re-identification
bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix
bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification
31
Ambiguity on two prefixes
bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache
URL Decomposition Prefix
Target URL abcabc A
bc B
Ambiguity
Type I gabcgabc C
abc A
bc B
Type II gbcgbc A larr collision on 32 bits
bc B
Type III defdef A larr collision on 32 bits
ef B larr collision on 32 bits
bull P[Type III] = 1264
bull Type II URLs exist only when decomp on the domain gt 232
bull P[Type I] gt P[Type II] gt P[Type III]
bull Mainly only Type I URLs create ambiguity in re-identification 32
How to track a given URL A real-world example (I)
petsymposiumorg2015
petsymposiumorg
petsymposiumorg2016linksphp
petsymposiumorg2016cfpphp
petsymposiumorg2016faqsphp
Type I URLs for target URL
petsymposiumorg2016
Goal Identify users interested in PETs
bull Target URL is petsymposiumorg2016
33
How to track a given URL A real-world example (II)
bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp
Decomposition Prefix
petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
bull Including 2 prefixes in the local cache rArr Anonymity set size of 4
bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes
bull Server receives 2 prefixes rArr visited page is the target URL
bull Server receives 3 prefixes rArr visited page is either of the leaf URLs
bull The third prefix decides which leaf URL was visited
bull Generalizable to any number of prefixes 34
Examples of URLs creating multiple hits
bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing
URL matching decomposition
Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf
17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp
httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues
1001cartesorgtag
Yandex
httpfrxhamstercomuservideofrxhamstercomxhamstercom
httpnlxhamstercomuservideonlxhamstercomxhamstercom
httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom
httpmmofoscomuserloginmmofoscommofoscom
httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom
teenslovehugecockscom
bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom
bull No need to add additional prefix for French or Dutch version35
Responsible disclosure and impact
bull Disclosure to Mozilla Firefox
ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox
bull Disclosure to Yandex
ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team
bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)
ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy
36
Conclusions amp Future Work
Conclusions
Lesson learnt Collisions are hard to tame in security and privacy
Bloom filters
bull Developers tend to ignore the worst-case of algorithms
bull Data structures with ad-hoc crypto primitives are at the best risky
bull Need of secure instantiations eg as in Count-Min sketches
Safe Browsing
bull Re-establish the weakness of anonymity-set privacy model
bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so
38
Future work
Bloom filters
bull On Bloom filters Bloom paradox [Rottenstreich 2015]
bull Beyond Bloom filters Security of Bloom filter variants
Safe Browsing
bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]
bull Privacy Can local cache improve Private Information Retrieval
39
Thank you
40
Other works not covered today
bull Performance of cryptographic accumulators
bull Private password auditing
bull (In)Security of Google and Yandex Safe Browsing
bull Alerting websites Risks and solutions
bull Decompression quines and anti-viruses
bull Pitfalls of hashing for privacy
bull Linkable (zero-knowledge) proofs for private and accountable gossip
41
Estimating anonymity-set size
bull Anonymity-set size of a prefix URLs that yield the prefix
Year URLs Domains
2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million
bull Estimate anonymity-set size Apply balls-into-bins model
Avg for URLs Avg for Domains
Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast
0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large
30
Sending multiple prefixes
Example with two prefixes
Decomposition Prefix
petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
Intuitively
bull Prefix for petsymposiumorg is not enough for re-identification
bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix
bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification
31
Ambiguity on two prefixes
bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache
URL Decomposition Prefix
Target URL abcabc A
bc B
Ambiguity
Type I gabcgabc C
abc A
bc B
Type II gbcgbc A larr collision on 32 bits
bc B
Type III defdef A larr collision on 32 bits
ef B larr collision on 32 bits
bull P[Type III] = 1264
bull Type II URLs exist only when decomp on the domain gt 232
bull P[Type I] gt P[Type II] gt P[Type III]
bull Mainly only Type I URLs create ambiguity in re-identification 32
How to track a given URL A real-world example (I)
petsymposiumorg2015
petsymposiumorg
petsymposiumorg2016linksphp
petsymposiumorg2016cfpphp
petsymposiumorg2016faqsphp
Type I URLs for target URL
petsymposiumorg2016
Goal Identify users interested in PETs
bull Target URL is petsymposiumorg2016
33
How to track a given URL A real-world example (II)
bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp
Decomposition Prefix
petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
bull Including 2 prefixes in the local cache rArr Anonymity set size of 4
bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes
bull Server receives 2 prefixes rArr visited page is the target URL
bull Server receives 3 prefixes rArr visited page is either of the leaf URLs
bull The third prefix decides which leaf URL was visited
bull Generalizable to any number of prefixes 34
Examples of URLs creating multiple hits
bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing
URL matching decomposition
Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf
17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp
httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues
1001cartesorgtag
Yandex
httpfrxhamstercomuservideofrxhamstercomxhamstercom
httpnlxhamstercomuservideonlxhamstercomxhamstercom
httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom
httpmmofoscomuserloginmmofoscommofoscom
httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom
teenslovehugecockscom
bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom
bull No need to add additional prefix for French or Dutch version35
Responsible disclosure and impact
bull Disclosure to Mozilla Firefox
ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox
bull Disclosure to Yandex
ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team
bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)
ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy
36
Conclusions amp Future Work
Conclusions
Lesson learnt Collisions are hard to tame in security and privacy
Bloom filters
bull Developers tend to ignore the worst-case of algorithms
bull Data structures with ad-hoc crypto primitives are at the best risky
bull Need of secure instantiations eg as in Count-Min sketches
Safe Browsing
bull Re-establish the weakness of anonymity-set privacy model
bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so
38
Future work
Bloom filters
bull On Bloom filters Bloom paradox [Rottenstreich 2015]
bull Beyond Bloom filters Security of Bloom filter variants
Safe Browsing
bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]
bull Privacy Can local cache improve Private Information Retrieval
39
Thank you
40
Other works not covered today
bull Performance of cryptographic accumulators
bull Private password auditing
bull (In)Security of Google and Yandex Safe Browsing
bull Alerting websites Risks and solutions
bull Decompression quines and anti-viruses
bull Pitfalls of hashing for privacy
bull Linkable (zero-knowledge) proofs for private and accountable gossip
41
Sending multiple prefixes
Example with two prefixes
Decomposition Prefix
petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
Intuitively
bull Prefix for petsymposiumorg is not enough for re-identification
bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix
bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification
31
Ambiguity on two prefixes
bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache
URL Decomposition Prefix
Target URL abcabc A
bc B
Ambiguity
Type I gabcgabc C
abc A
bc B
Type II gbcgbc A larr collision on 32 bits
bc B
Type III defdef A larr collision on 32 bits
ef B larr collision on 32 bits
bull P[Type III] = 1264
bull Type II URLs exist only when decomp on the domain gt 232
bull P[Type I] gt P[Type II] gt P[Type III]
bull Mainly only Type I URLs create ambiguity in re-identification 32
How to track a given URL A real-world example (I)
petsymposiumorg2015
petsymposiumorg
petsymposiumorg2016linksphp
petsymposiumorg2016cfpphp
petsymposiumorg2016faqsphp
Type I URLs for target URL
petsymposiumorg2016
Goal Identify users interested in PETs
bull Target URL is petsymposiumorg2016
33
How to track a given URL A real-world example (II)
bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp
Decomposition Prefix
petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
bull Including 2 prefixes in the local cache rArr Anonymity set size of 4
bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes
bull Server receives 2 prefixes rArr visited page is the target URL
bull Server receives 3 prefixes rArr visited page is either of the leaf URLs
bull The third prefix decides which leaf URL was visited
bull Generalizable to any number of prefixes 34
Examples of URLs creating multiple hits
bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing
URL matching decomposition
Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf
17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp
httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues
1001cartesorgtag
Yandex
httpfrxhamstercomuservideofrxhamstercomxhamstercom
httpnlxhamstercomuservideonlxhamstercomxhamstercom
httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom
httpmmofoscomuserloginmmofoscommofoscom
httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom
teenslovehugecockscom
bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom
bull No need to add additional prefix for French or Dutch version35
Responsible disclosure and impact
bull Disclosure to Mozilla Firefox
ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox
bull Disclosure to Yandex
ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team
bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)
ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy
36
Conclusions amp Future Work
Conclusions
Lesson learnt Collisions are hard to tame in security and privacy
Bloom filters
bull Developers tend to ignore the worst-case of algorithms
bull Data structures with ad-hoc crypto primitives are at the best risky
bull Need of secure instantiations eg as in Count-Min sketches
Safe Browsing
bull Re-establish the weakness of anonymity-set privacy model
bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so
38
Future work
Bloom filters
bull On Bloom filters Bloom paradox [Rottenstreich 2015]
bull Beyond Bloom filters Security of Bloom filter variants
Safe Browsing
bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]
bull Privacy Can local cache improve Private Information Retrieval
39
Thank you
40
Other works not covered today
bull Performance of cryptographic accumulators
bull Private password auditing
bull (In)Security of Google and Yandex Safe Browsing
bull Alerting websites Risks and solutions
bull Decompression quines and anti-viruses
bull Pitfalls of hashing for privacy
bull Linkable (zero-knowledge) proofs for private and accountable gossip
41
Ambiguity on two prefixes
bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache
URL Decomposition Prefix
Target URL abcabc A
bc B
Ambiguity
Type I gabcgabc C
abc A
bc B
Type II gbcgbc A larr collision on 32 bits
bc B
Type III defdef A larr collision on 32 bits
ef B larr collision on 32 bits
bull P[Type III] = 1264
bull Type II URLs exist only when decomp on the domain gt 232
bull P[Type I] gt P[Type II] gt P[Type III]
bull Mainly only Type I URLs create ambiguity in re-identification 32
How to track a given URL A real-world example (I)
petsymposiumorg2015
petsymposiumorg
petsymposiumorg2016linksphp
petsymposiumorg2016cfpphp
petsymposiumorg2016faqsphp
Type I URLs for target URL
petsymposiumorg2016
Goal Identify users interested in PETs
bull Target URL is petsymposiumorg2016
33
How to track a given URL A real-world example (II)
bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp
Decomposition Prefix
petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
bull Including 2 prefixes in the local cache rArr Anonymity set size of 4
bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes
bull Server receives 2 prefixes rArr visited page is the target URL
bull Server receives 3 prefixes rArr visited page is either of the leaf URLs
bull The third prefix decides which leaf URL was visited
bull Generalizable to any number of prefixes 34
Examples of URLs creating multiple hits
bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing
URL matching decomposition
Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf
17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp
httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues
1001cartesorgtag
Yandex
httpfrxhamstercomuservideofrxhamstercomxhamstercom
httpnlxhamstercomuservideonlxhamstercomxhamstercom
httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom
httpmmofoscomuserloginmmofoscommofoscom
httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom
teenslovehugecockscom
bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom
bull No need to add additional prefix for French or Dutch version35
Responsible disclosure and impact
bull Disclosure to Mozilla Firefox
ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox
bull Disclosure to Yandex
ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team
bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)
ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy
36
Conclusions amp Future Work
Conclusions
Lesson learnt Collisions are hard to tame in security and privacy
Bloom filters
bull Developers tend to ignore the worst-case of algorithms
bull Data structures with ad-hoc crypto primitives are at the best risky
bull Need of secure instantiations eg as in Count-Min sketches
Safe Browsing
bull Re-establish the weakness of anonymity-set privacy model
bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so
38
Future work
Bloom filters
bull On Bloom filters Bloom paradox [Rottenstreich 2015]
bull Beyond Bloom filters Security of Bloom filter variants
Safe Browsing
bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]
bull Privacy Can local cache improve Private Information Retrieval
39
Thank you
40
Other works not covered today
bull Performance of cryptographic accumulators
bull Private password auditing
bull (In)Security of Google and Yandex Safe Browsing
bull Alerting websites Risks and solutions
bull Decompression quines and anti-viruses
bull Pitfalls of hashing for privacy
bull Linkable (zero-knowledge) proofs for private and accountable gossip
41
How to track a given URL A real-world example (I)
petsymposiumorg2015
petsymposiumorg
petsymposiumorg2016linksphp
petsymposiumorg2016cfpphp
petsymposiumorg2016faqsphp
Type I URLs for target URL
petsymposiumorg2016
Goal Identify users interested in PETs
bull Target URL is petsymposiumorg2016
33
How to track a given URL A real-world example (II)
bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp
Decomposition Prefix
petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
bull Including 2 prefixes in the local cache rArr Anonymity set size of 4
bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes
bull Server receives 2 prefixes rArr visited page is the target URL
bull Server receives 3 prefixes rArr visited page is either of the leaf URLs
bull The third prefix decides which leaf URL was visited
bull Generalizable to any number of prefixes 34
Examples of URLs creating multiple hits
bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing
URL matching decomposition
Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf
17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp
httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues
1001cartesorgtag
Yandex
httpfrxhamstercomuservideofrxhamstercomxhamstercom
httpnlxhamstercomuservideonlxhamstercomxhamstercom
httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom
httpmmofoscomuserloginmmofoscommofoscom
httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom
teenslovehugecockscom
bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom
bull No need to add additional prefix for French or Dutch version35
Responsible disclosure and impact
bull Disclosure to Mozilla Firefox
ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox
bull Disclosure to Yandex
ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team
bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)
ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy
36
Conclusions amp Future Work
Conclusions
Lesson learnt Collisions are hard to tame in security and privacy
Bloom filters
bull Developers tend to ignore the worst-case of algorithms
bull Data structures with ad-hoc crypto primitives are at the best risky
bull Need of secure instantiations eg as in Count-Min sketches
Safe Browsing
bull Re-establish the weakness of anonymity-set privacy model
bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so
38
Future work
Bloom filters
bull On Bloom filters Bloom paradox [Rottenstreich 2015]
bull Beyond Bloom filters Security of Bloom filter variants
Safe Browsing
bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]
bull Privacy Can local cache improve Private Information Retrieval
39
Thank you
40
Other works not covered today
bull Performance of cryptographic accumulators
bull Private password auditing
bull (In)Security of Google and Yandex Safe Browsing
bull Alerting websites Risks and solutions
bull Decompression quines and anti-viruses
bull Pitfalls of hashing for privacy
bull Linkable (zero-knowledge) proofs for private and accountable gossip
41
How to track a given URL A real-world example (II)
bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp
Decomposition Prefix
petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5
bull Including 2 prefixes in the local cache rArr Anonymity set size of 4
bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes
bull Server receives 2 prefixes rArr visited page is the target URL
bull Server receives 3 prefixes rArr visited page is either of the leaf URLs
bull The third prefix decides which leaf URL was visited
bull Generalizable to any number of prefixes 34
Examples of URLs creating multiple hits
bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing
URL matching decomposition
Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf
17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp
httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues
1001cartesorgtag
Yandex
httpfrxhamstercomuservideofrxhamstercomxhamstercom
httpnlxhamstercomuservideonlxhamstercomxhamstercom
httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom
httpmmofoscomuserloginmmofoscommofoscom
httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom
teenslovehugecockscom
bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom
bull No need to add additional prefix for French or Dutch version35
Responsible disclosure and impact
bull Disclosure to Mozilla Firefox
ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox
bull Disclosure to Yandex
ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team
bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)
ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy
36
Conclusions amp Future Work
Conclusions
Lesson learnt Collisions are hard to tame in security and privacy
Bloom filters
bull Developers tend to ignore the worst-case of algorithms
bull Data structures with ad-hoc crypto primitives are at the best risky
bull Need of secure instantiations eg as in Count-Min sketches
Safe Browsing
bull Re-establish the weakness of anonymity-set privacy model
bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so
38
Future work
Bloom filters
bull On Bloom filters Bloom paradox [Rottenstreich 2015]
bull Beyond Bloom filters Security of Bloom filter variants
Safe Browsing
bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]
bull Privacy Can local cache improve Private Information Retrieval
39
Thank you
40
Other works not covered today
bull Performance of cryptographic accumulators
bull Private password auditing
bull (In)Security of Google and Yandex Safe Browsing
bull Alerting websites Risks and solutions
bull Decompression quines and anti-viruses
bull Pitfalls of hashing for privacy
bull Linkable (zero-knowledge) proofs for private and accountable gossip
41
Examples of URLs creating multiple hits
bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing
URL matching decomposition
Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf
17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp
httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues
1001cartesorgtag
Yandex
httpfrxhamstercomuservideofrxhamstercomxhamstercom
httpnlxhamstercomuservideonlxhamstercomxhamstercom
httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom
httpmmofoscomuserloginmmofoscommofoscom
httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom
teenslovehugecockscom
bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom
bull No need to add additional prefix for French or Dutch version35
Responsible disclosure and impact
bull Disclosure to Mozilla Firefox
ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox
bull Disclosure to Yandex
ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team
bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)
ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy
36
Conclusions amp Future Work
Conclusions
Lesson learnt Collisions are hard to tame in security and privacy
Bloom filters
bull Developers tend to ignore the worst-case of algorithms
bull Data structures with ad-hoc crypto primitives are at the best risky
bull Need of secure instantiations eg as in Count-Min sketches
Safe Browsing
bull Re-establish the weakness of anonymity-set privacy model
bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so
38
Future work
Bloom filters
bull On Bloom filters Bloom paradox [Rottenstreich 2015]
bull Beyond Bloom filters Security of Bloom filter variants
Safe Browsing
bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]
bull Privacy Can local cache improve Private Information Retrieval
39
Thank you
40
Other works not covered today
bull Performance of cryptographic accumulators
bull Private password auditing
bull (In)Security of Google and Yandex Safe Browsing
bull Alerting websites Risks and solutions
bull Decompression quines and anti-viruses
bull Pitfalls of hashing for privacy
bull Linkable (zero-knowledge) proofs for private and accountable gossip
41
Responsible disclosure and impact
bull Disclosure to Mozilla Firefox
ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox
bull Disclosure to Yandex
ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team
bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)
ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy
36
Conclusions amp Future Work
Conclusions
Lesson learnt Collisions are hard to tame in security and privacy
Bloom filters
bull Developers tend to ignore the worst-case of algorithms
bull Data structures with ad-hoc crypto primitives are at the best risky
bull Need of secure instantiations eg as in Count-Min sketches
Safe Browsing
bull Re-establish the weakness of anonymity-set privacy model
bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so
38
Future work
Bloom filters
bull On Bloom filters Bloom paradox [Rottenstreich 2015]
bull Beyond Bloom filters Security of Bloom filter variants
Safe Browsing
bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]
bull Privacy Can local cache improve Private Information Retrieval
39
Thank you
40
Other works not covered today
bull Performance of cryptographic accumulators
bull Private password auditing
bull (In)Security of Google and Yandex Safe Browsing
bull Alerting websites Risks and solutions
bull Decompression quines and anti-viruses
bull Pitfalls of hashing for privacy
bull Linkable (zero-knowledge) proofs for private and accountable gossip
41
Conclusions amp Future Work
Conclusions
Lesson learnt Collisions are hard to tame in security and privacy
Bloom filters
bull Developers tend to ignore the worst-case of algorithms
bull Data structures with ad-hoc crypto primitives are at the best risky
bull Need of secure instantiations eg as in Count-Min sketches
Safe Browsing
bull Re-establish the weakness of anonymity-set privacy model
bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so
38
Future work
Bloom filters
bull On Bloom filters Bloom paradox [Rottenstreich 2015]
bull Beyond Bloom filters Security of Bloom filter variants
Safe Browsing
bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]
bull Privacy Can local cache improve Private Information Retrieval
39
Thank you
40
Other works not covered today
bull Performance of cryptographic accumulators
bull Private password auditing
bull (In)Security of Google and Yandex Safe Browsing
bull Alerting websites Risks and solutions
bull Decompression quines and anti-viruses
bull Pitfalls of hashing for privacy
bull Linkable (zero-knowledge) proofs for private and accountable gossip
41
Conclusions
Lesson learnt Collisions are hard to tame in security and privacy
Bloom filters
bull Developers tend to ignore the worst-case of algorithms
bull Data structures with ad-hoc crypto primitives are at the best risky
bull Need of secure instantiations eg as in Count-Min sketches
Safe Browsing
bull Re-establish the weakness of anonymity-set privacy model
bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so
38
Future work
Bloom filters
bull On Bloom filters Bloom paradox [Rottenstreich 2015]
bull Beyond Bloom filters Security of Bloom filter variants
Safe Browsing
bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]
bull Privacy Can local cache improve Private Information Retrieval
39
Thank you
40
Other works not covered today
bull Performance of cryptographic accumulators
bull Private password auditing
bull (In)Security of Google and Yandex Safe Browsing
bull Alerting websites Risks and solutions
bull Decompression quines and anti-viruses
bull Pitfalls of hashing for privacy
bull Linkable (zero-knowledge) proofs for private and accountable gossip
41
Future work
Bloom filters
bull On Bloom filters Bloom paradox [Rottenstreich 2015]
bull Beyond Bloom filters Security of Bloom filter variants
Safe Browsing
bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]
bull Privacy Can local cache improve Private Information Retrieval
39
Thank you
40
Other works not covered today
bull Performance of cryptographic accumulators
bull Private password auditing
bull (In)Security of Google and Yandex Safe Browsing
bull Alerting websites Risks and solutions
bull Decompression quines and anti-viruses
bull Pitfalls of hashing for privacy
bull Linkable (zero-knowledge) proofs for private and accountable gossip
41
Thank you
40
Other works not covered today
bull Performance of cryptographic accumulators
bull Private password auditing
bull (In)Security of Google and Yandex Safe Browsing
bull Alerting websites Risks and solutions
bull Decompression quines and anti-viruses
bull Pitfalls of hashing for privacy
bull Linkable (zero-knowledge) proofs for private and accountable gossip
41
Other works not covered today
bull Performance of cryptographic accumulators
bull Private password auditing
bull (In)Security of Google and Yandex Safe Browsing
bull Alerting websites Risks and solutions
bull Decompression quines and anti-viruses
bull Pitfalls of hashing for privacy
bull Linkable (zero-knowledge) proofs for private and accountable gossip
41