41
Security and Privacy of Hash-Based Software Applications This work has been partially supported by the LabEx PERSYVAL-Lab (ANR-11-LABX-0025-01) funded by the French program Investissement d’avenir. Amrit Kumar October 20, 2016 Privatics team, Inria Université Grenoble Alpes

Security and Privacy of Hash-Based Software Applications Amrit KUM… · Software Applications This work has been partially supported by the LabEx PERSYVAL-Lab (ANR-11-LABX-0025-01)

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Security and Privacy of Hash-BasedSoftware ApplicationsThis work has been partially supported by the LabEx PERSYVAL-Lab(ANR-11-LABX-0025-01) funded by the French program Investissement drsquoavenir

Amrit Kumar

October 20 2016

Privatics team InriaUniversiteacute Grenoble Alpes

Hashing

bull A function h 0 1lowast rarr 0 1` where ` is the digest size

bull Cryptographic (second) pre-image and collision resistant

h

h(x)

pre-imageresistance

h

x

h(x)

2nd pre-imageresistance

h

h(x prime)

6=

=

h

h(x)

collisionresistance

h

h(x prime)

6=

=

2` 2` 2`2Best genericattack

2

Are collisions always bad (I)

A simple use case

bull Instead of storing n (large) data items store their digestsbull If ` is large collisions are hard to find rArr space required = ntimes ` bits

Collisions for further space savings

x1

x2

x3

xn

d1

di

dm

Data Digests

bull di now substitutes both x1 and x2 rArr space required lt n times ` bitsbull Caveat May introduce some unexpected behaviorbull Core of several efficient (probabilistic) data structures

bull Bloom filters for membership testingbull Sketches for data stream analysis 3

Are collisions always bad (II)

Use case in privacy

bull Hashing as a pseudonymization technique

bull If ` is large but identifiers n is enumerable (in reasonable time)bull Exhaustive search breaks pseudonymization

d

d1

d2

` is large ` is small

h

h

h

h

bull If ` is sufficiently smallbull On average n2` identifiers share the same pseudonymbull Notion of anonymity-setbull Caveat Provides weak anonymity guaranteesbull Employed in Google Safe Browsing a malicious URL detection tool 4

Contrasting perspectives and outline

Contrasting perspectives

bull Collisions have to be absolutely avoided in cryptography

bull Somewhat welcome in algorithms and data structures

bull Useful to some extent in the context of privacy

Goal Investigate the security and privacy implications of hash collisions

Focus for today

bull Security Bloom Filters1 The Power of Evil Choices in Bloom Filters DSNrsquo15

Joint work with T Gerbet and C Lauradoux2 Bloom Filters in Adversarial Settings Under submission

Joint work with C Lauradoux and P Lafourcade

bull Privacy Safe Browsing1 A Privacy Analysis of Google and Yandex Safe Browsing DSNrsquo16

Joint work with T Gerbet and C Lauradoux

5

Security Bloom Filters

Bloom filters [Bloom 1970]

Setup(m n k)

bull A binary vector ~z of size m compressing a set of n itemsbull k uniform and independent hash functions hi 0 1lowast rarr [0m minus 1]bull ~z initialized to ~0

Operations

bull Insert(x) Set bits of ~z at h1(x) hk(x) to 1

0 0 0 1 1 0 1 1 0 1

S = x1 x2 x3 k = 2

y1 isin S y2 = x2 y3 isin S(false positive)

bull Query(y) Return True if bits of ~z at h1(y) hk(y) are all 1bull False positive rate and its optimum value have been well studied 7

Our contributions

bull Define adversary models for Bloom filtersbull Query-only adversarybull Chosen-insertion adversarybull Deletion adversary

bull Specific to counting Bloom filters (not covered today)

bull DoS attacks on Bloom enabled software applicationsbull Increase false positive probabilitybull Increase query time

bull Worst-case analysis of Bloom filtersbull false-positive probabilitybull new filter parameters

bull Bloom hash tables as a potential replacement for Bloom filters

8

Query-only adversary

Capabilities Only queries to the filterAssumption State of the filter is known

Goals

bull Craft items that generate false positivesbull Probability to forge a false positive is

(wH (~z)

m

)k

wH(middot) is the Hamming weight

bull Or items whose processing leads to latencybull First k minus 1 bits are set to 1 and the k-th bit set to 0

1 0 1 0 1 1 0 1 1 0 1 0

y k = 3

bull The probability of finding such an item is

(m minus wH(~z)) middot(wH (~z)kminus1

)mk

9

Chosen-insertion adversary

Capabilities Can choose items to insert in the filterAssumption State of the filter is knownGoal Increase the false positive probability

Strategy Greedily insert x that maximizes bits set to 1

bull Each x sets k bits to 1

0 0 0 1 0 1 1 1 1 1 1 1

x1 x2 x3 x4

Impact

No attack Under attackbits set to 1 072nkopt nkopt

false positive rate (f ) 12kopt

(nkoptm

)kopt

10

Impact on a sample filter

Parameters m = 3200 n = 600 kopt = 4 fopt = 0077

0 100 200 300 400 500 600Number of inserted items

False positive probability

0

007

014

021

028

035

f advPartialf

fopt

insert last200 items

11

Applying adversary models

Factors enabling our attacks

bull Insecure hash functions

bull Digest truncation

bull High Bloom filter false positive rate

Vulnerable software applications

Software app Hashing Parameter info

Scrapy Web crawler NA NADablooms Spam filter MurmurHash n = 100000 f = 0057Squid Web proxy MD5 f = 009 k = 4AIEngine NIDS C++ hash ` = 13 n = 5000 f = 045NSRL Forensic tool SHA-1 ` = 32 n asymp 14times 106 f = 808times 10minus10

sdhash Forensic tool SHA-1 ` = 11 n = 128 f = 00014

12

Bypassing a forensic tool

NSRL forensic tool

bull A whitelist of ldquoknown safe filesrdquo

bull Stored and distributed as a Bloom filter

bull Maintained by NIST

A query-only attack Goal is to hide a contraband file

bull Adversary modifies the file to create a false positive

bull Modification should be easily reversible

bull The filter detects the file as safe

13

Countermeasure against chosen-insertion attacks

Use worst-case parameters for Bloom filters

bull Fix m n and choose k that minimizes false positive probability

f adv =

(nk

m

)k

Optimal values are

kadvopt =

m

enand f adv

opt = eminusmen

bull Impact On a sample Bloom filter with m = 3200 n = 600bull Average case kopt = 4 fopt = 0077bull Worst case kadv

opt = 2 f advopt = 01

14

Summary of other attacks amp defenses

Attacks

Software app Attacks

Scrapy Web crawler chosen-insertion query-onlyDablooms Spam filter chosen-insertion deletionSquid Web proxy chosen-insertion query-onlyAIEngine NIDS query-onlysdhash Forensic tool query-only

Defenses

bull Use HMAC

bull Use an alternate data structure Bloom hash tables [Bloom 1970]bull Resists better to chosen-insertion attacksbull Is often more memory efficient than Bloom filtersbull On average O(k) hash computations for items not in the tablebull On average O(ln k) for items in the table

15

Related work

bull Algorithmic complexity attacks [Crosby et al 2003]bull DoS attacks against hash tablesbull Force hash tables to operate in O(n) instead of O(1)bull Similar attacks against skip-lists regular expressions etc

bull Independent work on Bloom filters [Naor et al 2015]bull Provide a theoretical frameworkbull Study a query-only adversary Can only adaptively query the filter

16

Privacy Safe Browsing

Google Safe Browsing in Mozilla Firefox

18

And many others

19

Adverted privacy policy

ldquoWe collect visited web pages clickstream data or web addressaccessed browser identifier and user IDrdquo mdash WOT

ldquocollects information including IP address the origin of thesearch and may share this info with a third partyrdquo mdash Norton

Many Safe Browsing services are privacy unfriendly by design

ldquocannot determine the real URL from the informationreceivedrdquo mdash Google

bull Google seems to provide the most private service

bull Hence focus of this work

20

Google Safe Browsing When Why and How

bull When In 2008 by Google

bull Goals Protect frombull Phishing sitesbull Malware sites

bull How Easy-to-use APIs in C Python and PHP

bull Methodology Blacklists

bull Available in

bull Impactbull Billions of usersbull Detects thousands of new malicious websites per day

bull Cloned by Yandex as Yandex Safe Browsing 21

Lookup API

bull Google harvests phishing and malware URLs to feed a blacklist

bull Client checks the status using a simple HTTP GETPOST requestsb-sslgooglecomsafebrowsingapilookupexamplecom

Issues

bull Does not scale Heavy network traffic

bull Privacy URLs are sent in clear

22

Improving privacy using a local cache

Database

Server ClientLocal cache

(3) Conditionalquery

(4) Answer

(1) Query

(2) Answer

Extended client

bull Communication with the server is reduced

bull Better privacy

23

Google Safe Browsing API (v3) Local cache

bull BlacklistsList name Description Entries

goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1

bull Does not handle URLs directly instead their SHA-256 digests

wwwevilcom SHA-256 cc7af8a31918

cc7af8a31918

32-bit prefix

bull Local cache contains prefixes 24

Clientrsquos behavior chart

Userrsquos input

URLCanonicalize andcompute digest

Foundprefix

Get fulldigests

Founddigest

MaliciousURL

Non-maliciousURL

yes

no

yes

no

25

Canonicalization and decompositions

bull Input URLhttpusrpwdabcport12extparam=1frags

bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password

bull Multiple decompositions are checked for a single URL

Decompositions of canonicalized URL

1 abc12extparam=12 abc12ext3 abc14 abc

5 bc12extparam=16 bc12ext7 bc18 bc

bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious

26

Purpose of computing decompositions

Memory saving

bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix

A more intricate example

Unsafe link Safe link

dbcebc

bcabc2

abc1abc

bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc

27

Privacy of Google Safe Browsing

ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy

Our goal A privacy analysis of Google and Yandex Safe Browsing

URL Prefix

wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3

bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set

28

Tracking Safe Browsing users

Our assumptions

bull Google and Yandex have incentives to behave maliciously

bull Wish to learn whether a user visits some selected URLs

How Google can track Safe Browsing users

bull Builds a list of prefixes to track

bull Includes these prefixes in the clientrsquos local cache

bull Learns from the requests whether a user visited a specific URL

bull Key parameter Anonymity-set size

29

Estimating anonymity-set size

bull Anonymity-set size of a prefix URLs that yield the prefix

Year URLs Domains

2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million

bull Estimate anonymity-set size Apply balls-into-bins model

Avg for URLs Avg for Domains

Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast

0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large

30

Sending multiple prefixes

Example with two prefixes

Decomposition Prefix

petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

Intuitively

bull Prefix for petsymposiumorg is not enough for re-identification

bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix

bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification

31

Ambiguity on two prefixes

bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache

URL Decomposition Prefix

Target URL abcabc A

bc B

Ambiguity

Type I gabcgabc C

abc A

bc B

Type II gbcgbc A larr collision on 32 bits

bc B

Type III defdef A larr collision on 32 bits

ef B larr collision on 32 bits

bull P[Type III] = 1264

bull Type II URLs exist only when decomp on the domain gt 232

bull P[Type I] gt P[Type II] gt P[Type III]

bull Mainly only Type I URLs create ambiguity in re-identification 32

How to track a given URL A real-world example (I)

petsymposiumorg2015

petsymposiumorg

petsymposiumorg2016linksphp

petsymposiumorg2016cfpphp

petsymposiumorg2016faqsphp

Type I URLs for target URL

petsymposiumorg2016

Goal Identify users interested in PETs

bull Target URL is petsymposiumorg2016

33

How to track a given URL A real-world example (II)

bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp

Decomposition Prefix

petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

bull Including 2 prefixes in the local cache rArr Anonymity set size of 4

bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes

bull Server receives 2 prefixes rArr visited page is the target URL

bull Server receives 3 prefixes rArr visited page is either of the leaf URLs

bull The third prefix decides which leaf URL was visited

bull Generalizable to any number of prefixes 34

Examples of URLs creating multiple hits

bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing

URL matching decomposition

Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf

17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp

httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues

1001cartesorgtag

Yandex

httpfrxhamstercomuservideofrxhamstercomxhamstercom

httpnlxhamstercomuservideonlxhamstercomxhamstercom

httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom

httpmmofoscomuserloginmmofoscommofoscom

httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom

teenslovehugecockscom

bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom

bull No need to add additional prefix for French or Dutch version35

Responsible disclosure and impact

bull Disclosure to Mozilla Firefox

ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox

bull Disclosure to Yandex

ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team

bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)

ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy

36

Conclusions amp Future Work

Conclusions

Lesson learnt Collisions are hard to tame in security and privacy

Bloom filters

bull Developers tend to ignore the worst-case of algorithms

bull Data structures with ad-hoc crypto primitives are at the best risky

bull Need of secure instantiations eg as in Count-Min sketches

Safe Browsing

bull Re-establish the weakness of anonymity-set privacy model

bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so

38

Future work

Bloom filters

bull On Bloom filters Bloom paradox [Rottenstreich 2015]

bull Beyond Bloom filters Security of Bloom filter variants

Safe Browsing

bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]

bull Privacy Can local cache improve Private Information Retrieval

39

Thank you

40

Other works not covered today

bull Performance of cryptographic accumulators

bull Private password auditing

bull (In)Security of Google and Yandex Safe Browsing

bull Alerting websites Risks and solutions

bull Decompression quines and anti-viruses

bull Pitfalls of hashing for privacy

bull Linkable (zero-knowledge) proofs for private and accountable gossip

41

Hashing

bull A function h 0 1lowast rarr 0 1` where ` is the digest size

bull Cryptographic (second) pre-image and collision resistant

h

h(x)

pre-imageresistance

h

x

h(x)

2nd pre-imageresistance

h

h(x prime)

6=

=

h

h(x)

collisionresistance

h

h(x prime)

6=

=

2` 2` 2`2Best genericattack

2

Are collisions always bad (I)

A simple use case

bull Instead of storing n (large) data items store their digestsbull If ` is large collisions are hard to find rArr space required = ntimes ` bits

Collisions for further space savings

x1

x2

x3

xn

d1

di

dm

Data Digests

bull di now substitutes both x1 and x2 rArr space required lt n times ` bitsbull Caveat May introduce some unexpected behaviorbull Core of several efficient (probabilistic) data structures

bull Bloom filters for membership testingbull Sketches for data stream analysis 3

Are collisions always bad (II)

Use case in privacy

bull Hashing as a pseudonymization technique

bull If ` is large but identifiers n is enumerable (in reasonable time)bull Exhaustive search breaks pseudonymization

d

d1

d2

` is large ` is small

h

h

h

h

bull If ` is sufficiently smallbull On average n2` identifiers share the same pseudonymbull Notion of anonymity-setbull Caveat Provides weak anonymity guaranteesbull Employed in Google Safe Browsing a malicious URL detection tool 4

Contrasting perspectives and outline

Contrasting perspectives

bull Collisions have to be absolutely avoided in cryptography

bull Somewhat welcome in algorithms and data structures

bull Useful to some extent in the context of privacy

Goal Investigate the security and privacy implications of hash collisions

Focus for today

bull Security Bloom Filters1 The Power of Evil Choices in Bloom Filters DSNrsquo15

Joint work with T Gerbet and C Lauradoux2 Bloom Filters in Adversarial Settings Under submission

Joint work with C Lauradoux and P Lafourcade

bull Privacy Safe Browsing1 A Privacy Analysis of Google and Yandex Safe Browsing DSNrsquo16

Joint work with T Gerbet and C Lauradoux

5

Security Bloom Filters

Bloom filters [Bloom 1970]

Setup(m n k)

bull A binary vector ~z of size m compressing a set of n itemsbull k uniform and independent hash functions hi 0 1lowast rarr [0m minus 1]bull ~z initialized to ~0

Operations

bull Insert(x) Set bits of ~z at h1(x) hk(x) to 1

0 0 0 1 1 0 1 1 0 1

S = x1 x2 x3 k = 2

y1 isin S y2 = x2 y3 isin S(false positive)

bull Query(y) Return True if bits of ~z at h1(y) hk(y) are all 1bull False positive rate and its optimum value have been well studied 7

Our contributions

bull Define adversary models for Bloom filtersbull Query-only adversarybull Chosen-insertion adversarybull Deletion adversary

bull Specific to counting Bloom filters (not covered today)

bull DoS attacks on Bloom enabled software applicationsbull Increase false positive probabilitybull Increase query time

bull Worst-case analysis of Bloom filtersbull false-positive probabilitybull new filter parameters

bull Bloom hash tables as a potential replacement for Bloom filters

8

Query-only adversary

Capabilities Only queries to the filterAssumption State of the filter is known

Goals

bull Craft items that generate false positivesbull Probability to forge a false positive is

(wH (~z)

m

)k

wH(middot) is the Hamming weight

bull Or items whose processing leads to latencybull First k minus 1 bits are set to 1 and the k-th bit set to 0

1 0 1 0 1 1 0 1 1 0 1 0

y k = 3

bull The probability of finding such an item is

(m minus wH(~z)) middot(wH (~z)kminus1

)mk

9

Chosen-insertion adversary

Capabilities Can choose items to insert in the filterAssumption State of the filter is knownGoal Increase the false positive probability

Strategy Greedily insert x that maximizes bits set to 1

bull Each x sets k bits to 1

0 0 0 1 0 1 1 1 1 1 1 1

x1 x2 x3 x4

Impact

No attack Under attackbits set to 1 072nkopt nkopt

false positive rate (f ) 12kopt

(nkoptm

)kopt

10

Impact on a sample filter

Parameters m = 3200 n = 600 kopt = 4 fopt = 0077

0 100 200 300 400 500 600Number of inserted items

False positive probability

0

007

014

021

028

035

f advPartialf

fopt

insert last200 items

11

Applying adversary models

Factors enabling our attacks

bull Insecure hash functions

bull Digest truncation

bull High Bloom filter false positive rate

Vulnerable software applications

Software app Hashing Parameter info

Scrapy Web crawler NA NADablooms Spam filter MurmurHash n = 100000 f = 0057Squid Web proxy MD5 f = 009 k = 4AIEngine NIDS C++ hash ` = 13 n = 5000 f = 045NSRL Forensic tool SHA-1 ` = 32 n asymp 14times 106 f = 808times 10minus10

sdhash Forensic tool SHA-1 ` = 11 n = 128 f = 00014

12

Bypassing a forensic tool

NSRL forensic tool

bull A whitelist of ldquoknown safe filesrdquo

bull Stored and distributed as a Bloom filter

bull Maintained by NIST

A query-only attack Goal is to hide a contraband file

bull Adversary modifies the file to create a false positive

bull Modification should be easily reversible

bull The filter detects the file as safe

13

Countermeasure against chosen-insertion attacks

Use worst-case parameters for Bloom filters

bull Fix m n and choose k that minimizes false positive probability

f adv =

(nk

m

)k

Optimal values are

kadvopt =

m

enand f adv

opt = eminusmen

bull Impact On a sample Bloom filter with m = 3200 n = 600bull Average case kopt = 4 fopt = 0077bull Worst case kadv

opt = 2 f advopt = 01

14

Summary of other attacks amp defenses

Attacks

Software app Attacks

Scrapy Web crawler chosen-insertion query-onlyDablooms Spam filter chosen-insertion deletionSquid Web proxy chosen-insertion query-onlyAIEngine NIDS query-onlysdhash Forensic tool query-only

Defenses

bull Use HMAC

bull Use an alternate data structure Bloom hash tables [Bloom 1970]bull Resists better to chosen-insertion attacksbull Is often more memory efficient than Bloom filtersbull On average O(k) hash computations for items not in the tablebull On average O(ln k) for items in the table

15

Related work

bull Algorithmic complexity attacks [Crosby et al 2003]bull DoS attacks against hash tablesbull Force hash tables to operate in O(n) instead of O(1)bull Similar attacks against skip-lists regular expressions etc

bull Independent work on Bloom filters [Naor et al 2015]bull Provide a theoretical frameworkbull Study a query-only adversary Can only adaptively query the filter

16

Privacy Safe Browsing

Google Safe Browsing in Mozilla Firefox

18

And many others

19

Adverted privacy policy

ldquoWe collect visited web pages clickstream data or web addressaccessed browser identifier and user IDrdquo mdash WOT

ldquocollects information including IP address the origin of thesearch and may share this info with a third partyrdquo mdash Norton

Many Safe Browsing services are privacy unfriendly by design

ldquocannot determine the real URL from the informationreceivedrdquo mdash Google

bull Google seems to provide the most private service

bull Hence focus of this work

20

Google Safe Browsing When Why and How

bull When In 2008 by Google

bull Goals Protect frombull Phishing sitesbull Malware sites

bull How Easy-to-use APIs in C Python and PHP

bull Methodology Blacklists

bull Available in

bull Impactbull Billions of usersbull Detects thousands of new malicious websites per day

bull Cloned by Yandex as Yandex Safe Browsing 21

Lookup API

bull Google harvests phishing and malware URLs to feed a blacklist

bull Client checks the status using a simple HTTP GETPOST requestsb-sslgooglecomsafebrowsingapilookupexamplecom

Issues

bull Does not scale Heavy network traffic

bull Privacy URLs are sent in clear

22

Improving privacy using a local cache

Database

Server ClientLocal cache

(3) Conditionalquery

(4) Answer

(1) Query

(2) Answer

Extended client

bull Communication with the server is reduced

bull Better privacy

23

Google Safe Browsing API (v3) Local cache

bull BlacklistsList name Description Entries

goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1

bull Does not handle URLs directly instead their SHA-256 digests

wwwevilcom SHA-256 cc7af8a31918

cc7af8a31918

32-bit prefix

bull Local cache contains prefixes 24

Clientrsquos behavior chart

Userrsquos input

URLCanonicalize andcompute digest

Foundprefix

Get fulldigests

Founddigest

MaliciousURL

Non-maliciousURL

yes

no

yes

no

25

Canonicalization and decompositions

bull Input URLhttpusrpwdabcport12extparam=1frags

bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password

bull Multiple decompositions are checked for a single URL

Decompositions of canonicalized URL

1 abc12extparam=12 abc12ext3 abc14 abc

5 bc12extparam=16 bc12ext7 bc18 bc

bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious

26

Purpose of computing decompositions

Memory saving

bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix

A more intricate example

Unsafe link Safe link

dbcebc

bcabc2

abc1abc

bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc

27

Privacy of Google Safe Browsing

ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy

Our goal A privacy analysis of Google and Yandex Safe Browsing

URL Prefix

wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3

bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set

28

Tracking Safe Browsing users

Our assumptions

bull Google and Yandex have incentives to behave maliciously

bull Wish to learn whether a user visits some selected URLs

How Google can track Safe Browsing users

bull Builds a list of prefixes to track

bull Includes these prefixes in the clientrsquos local cache

bull Learns from the requests whether a user visited a specific URL

bull Key parameter Anonymity-set size

29

Estimating anonymity-set size

bull Anonymity-set size of a prefix URLs that yield the prefix

Year URLs Domains

2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million

bull Estimate anonymity-set size Apply balls-into-bins model

Avg for URLs Avg for Domains

Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast

0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large

30

Sending multiple prefixes

Example with two prefixes

Decomposition Prefix

petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

Intuitively

bull Prefix for petsymposiumorg is not enough for re-identification

bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix

bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification

31

Ambiguity on two prefixes

bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache

URL Decomposition Prefix

Target URL abcabc A

bc B

Ambiguity

Type I gabcgabc C

abc A

bc B

Type II gbcgbc A larr collision on 32 bits

bc B

Type III defdef A larr collision on 32 bits

ef B larr collision on 32 bits

bull P[Type III] = 1264

bull Type II URLs exist only when decomp on the domain gt 232

bull P[Type I] gt P[Type II] gt P[Type III]

bull Mainly only Type I URLs create ambiguity in re-identification 32

How to track a given URL A real-world example (I)

petsymposiumorg2015

petsymposiumorg

petsymposiumorg2016linksphp

petsymposiumorg2016cfpphp

petsymposiumorg2016faqsphp

Type I URLs for target URL

petsymposiumorg2016

Goal Identify users interested in PETs

bull Target URL is petsymposiumorg2016

33

How to track a given URL A real-world example (II)

bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp

Decomposition Prefix

petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

bull Including 2 prefixes in the local cache rArr Anonymity set size of 4

bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes

bull Server receives 2 prefixes rArr visited page is the target URL

bull Server receives 3 prefixes rArr visited page is either of the leaf URLs

bull The third prefix decides which leaf URL was visited

bull Generalizable to any number of prefixes 34

Examples of URLs creating multiple hits

bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing

URL matching decomposition

Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf

17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp

httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues

1001cartesorgtag

Yandex

httpfrxhamstercomuservideofrxhamstercomxhamstercom

httpnlxhamstercomuservideonlxhamstercomxhamstercom

httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom

httpmmofoscomuserloginmmofoscommofoscom

httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom

teenslovehugecockscom

bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom

bull No need to add additional prefix for French or Dutch version35

Responsible disclosure and impact

bull Disclosure to Mozilla Firefox

ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox

bull Disclosure to Yandex

ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team

bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)

ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy

36

Conclusions amp Future Work

Conclusions

Lesson learnt Collisions are hard to tame in security and privacy

Bloom filters

bull Developers tend to ignore the worst-case of algorithms

bull Data structures with ad-hoc crypto primitives are at the best risky

bull Need of secure instantiations eg as in Count-Min sketches

Safe Browsing

bull Re-establish the weakness of anonymity-set privacy model

bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so

38

Future work

Bloom filters

bull On Bloom filters Bloom paradox [Rottenstreich 2015]

bull Beyond Bloom filters Security of Bloom filter variants

Safe Browsing

bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]

bull Privacy Can local cache improve Private Information Retrieval

39

Thank you

40

Other works not covered today

bull Performance of cryptographic accumulators

bull Private password auditing

bull (In)Security of Google and Yandex Safe Browsing

bull Alerting websites Risks and solutions

bull Decompression quines and anti-viruses

bull Pitfalls of hashing for privacy

bull Linkable (zero-knowledge) proofs for private and accountable gossip

41

Are collisions always bad (I)

A simple use case

bull Instead of storing n (large) data items store their digestsbull If ` is large collisions are hard to find rArr space required = ntimes ` bits

Collisions for further space savings

x1

x2

x3

xn

d1

di

dm

Data Digests

bull di now substitutes both x1 and x2 rArr space required lt n times ` bitsbull Caveat May introduce some unexpected behaviorbull Core of several efficient (probabilistic) data structures

bull Bloom filters for membership testingbull Sketches for data stream analysis 3

Are collisions always bad (II)

Use case in privacy

bull Hashing as a pseudonymization technique

bull If ` is large but identifiers n is enumerable (in reasonable time)bull Exhaustive search breaks pseudonymization

d

d1

d2

` is large ` is small

h

h

h

h

bull If ` is sufficiently smallbull On average n2` identifiers share the same pseudonymbull Notion of anonymity-setbull Caveat Provides weak anonymity guaranteesbull Employed in Google Safe Browsing a malicious URL detection tool 4

Contrasting perspectives and outline

Contrasting perspectives

bull Collisions have to be absolutely avoided in cryptography

bull Somewhat welcome in algorithms and data structures

bull Useful to some extent in the context of privacy

Goal Investigate the security and privacy implications of hash collisions

Focus for today

bull Security Bloom Filters1 The Power of Evil Choices in Bloom Filters DSNrsquo15

Joint work with T Gerbet and C Lauradoux2 Bloom Filters in Adversarial Settings Under submission

Joint work with C Lauradoux and P Lafourcade

bull Privacy Safe Browsing1 A Privacy Analysis of Google and Yandex Safe Browsing DSNrsquo16

Joint work with T Gerbet and C Lauradoux

5

Security Bloom Filters

Bloom filters [Bloom 1970]

Setup(m n k)

bull A binary vector ~z of size m compressing a set of n itemsbull k uniform and independent hash functions hi 0 1lowast rarr [0m minus 1]bull ~z initialized to ~0

Operations

bull Insert(x) Set bits of ~z at h1(x) hk(x) to 1

0 0 0 1 1 0 1 1 0 1

S = x1 x2 x3 k = 2

y1 isin S y2 = x2 y3 isin S(false positive)

bull Query(y) Return True if bits of ~z at h1(y) hk(y) are all 1bull False positive rate and its optimum value have been well studied 7

Our contributions

bull Define adversary models for Bloom filtersbull Query-only adversarybull Chosen-insertion adversarybull Deletion adversary

bull Specific to counting Bloom filters (not covered today)

bull DoS attacks on Bloom enabled software applicationsbull Increase false positive probabilitybull Increase query time

bull Worst-case analysis of Bloom filtersbull false-positive probabilitybull new filter parameters

bull Bloom hash tables as a potential replacement for Bloom filters

8

Query-only adversary

Capabilities Only queries to the filterAssumption State of the filter is known

Goals

bull Craft items that generate false positivesbull Probability to forge a false positive is

(wH (~z)

m

)k

wH(middot) is the Hamming weight

bull Or items whose processing leads to latencybull First k minus 1 bits are set to 1 and the k-th bit set to 0

1 0 1 0 1 1 0 1 1 0 1 0

y k = 3

bull The probability of finding such an item is

(m minus wH(~z)) middot(wH (~z)kminus1

)mk

9

Chosen-insertion adversary

Capabilities Can choose items to insert in the filterAssumption State of the filter is knownGoal Increase the false positive probability

Strategy Greedily insert x that maximizes bits set to 1

bull Each x sets k bits to 1

0 0 0 1 0 1 1 1 1 1 1 1

x1 x2 x3 x4

Impact

No attack Under attackbits set to 1 072nkopt nkopt

false positive rate (f ) 12kopt

(nkoptm

)kopt

10

Impact on a sample filter

Parameters m = 3200 n = 600 kopt = 4 fopt = 0077

0 100 200 300 400 500 600Number of inserted items

False positive probability

0

007

014

021

028

035

f advPartialf

fopt

insert last200 items

11

Applying adversary models

Factors enabling our attacks

bull Insecure hash functions

bull Digest truncation

bull High Bloom filter false positive rate

Vulnerable software applications

Software app Hashing Parameter info

Scrapy Web crawler NA NADablooms Spam filter MurmurHash n = 100000 f = 0057Squid Web proxy MD5 f = 009 k = 4AIEngine NIDS C++ hash ` = 13 n = 5000 f = 045NSRL Forensic tool SHA-1 ` = 32 n asymp 14times 106 f = 808times 10minus10

sdhash Forensic tool SHA-1 ` = 11 n = 128 f = 00014

12

Bypassing a forensic tool

NSRL forensic tool

bull A whitelist of ldquoknown safe filesrdquo

bull Stored and distributed as a Bloom filter

bull Maintained by NIST

A query-only attack Goal is to hide a contraband file

bull Adversary modifies the file to create a false positive

bull Modification should be easily reversible

bull The filter detects the file as safe

13

Countermeasure against chosen-insertion attacks

Use worst-case parameters for Bloom filters

bull Fix m n and choose k that minimizes false positive probability

f adv =

(nk

m

)k

Optimal values are

kadvopt =

m

enand f adv

opt = eminusmen

bull Impact On a sample Bloom filter with m = 3200 n = 600bull Average case kopt = 4 fopt = 0077bull Worst case kadv

opt = 2 f advopt = 01

14

Summary of other attacks amp defenses

Attacks

Software app Attacks

Scrapy Web crawler chosen-insertion query-onlyDablooms Spam filter chosen-insertion deletionSquid Web proxy chosen-insertion query-onlyAIEngine NIDS query-onlysdhash Forensic tool query-only

Defenses

bull Use HMAC

bull Use an alternate data structure Bloom hash tables [Bloom 1970]bull Resists better to chosen-insertion attacksbull Is often more memory efficient than Bloom filtersbull On average O(k) hash computations for items not in the tablebull On average O(ln k) for items in the table

15

Related work

bull Algorithmic complexity attacks [Crosby et al 2003]bull DoS attacks against hash tablesbull Force hash tables to operate in O(n) instead of O(1)bull Similar attacks against skip-lists regular expressions etc

bull Independent work on Bloom filters [Naor et al 2015]bull Provide a theoretical frameworkbull Study a query-only adversary Can only adaptively query the filter

16

Privacy Safe Browsing

Google Safe Browsing in Mozilla Firefox

18

And many others

19

Adverted privacy policy

ldquoWe collect visited web pages clickstream data or web addressaccessed browser identifier and user IDrdquo mdash WOT

ldquocollects information including IP address the origin of thesearch and may share this info with a third partyrdquo mdash Norton

Many Safe Browsing services are privacy unfriendly by design

ldquocannot determine the real URL from the informationreceivedrdquo mdash Google

bull Google seems to provide the most private service

bull Hence focus of this work

20

Google Safe Browsing When Why and How

bull When In 2008 by Google

bull Goals Protect frombull Phishing sitesbull Malware sites

bull How Easy-to-use APIs in C Python and PHP

bull Methodology Blacklists

bull Available in

bull Impactbull Billions of usersbull Detects thousands of new malicious websites per day

bull Cloned by Yandex as Yandex Safe Browsing 21

Lookup API

bull Google harvests phishing and malware URLs to feed a blacklist

bull Client checks the status using a simple HTTP GETPOST requestsb-sslgooglecomsafebrowsingapilookupexamplecom

Issues

bull Does not scale Heavy network traffic

bull Privacy URLs are sent in clear

22

Improving privacy using a local cache

Database

Server ClientLocal cache

(3) Conditionalquery

(4) Answer

(1) Query

(2) Answer

Extended client

bull Communication with the server is reduced

bull Better privacy

23

Google Safe Browsing API (v3) Local cache

bull BlacklistsList name Description Entries

goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1

bull Does not handle URLs directly instead their SHA-256 digests

wwwevilcom SHA-256 cc7af8a31918

cc7af8a31918

32-bit prefix

bull Local cache contains prefixes 24

Clientrsquos behavior chart

Userrsquos input

URLCanonicalize andcompute digest

Foundprefix

Get fulldigests

Founddigest

MaliciousURL

Non-maliciousURL

yes

no

yes

no

25

Canonicalization and decompositions

bull Input URLhttpusrpwdabcport12extparam=1frags

bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password

bull Multiple decompositions are checked for a single URL

Decompositions of canonicalized URL

1 abc12extparam=12 abc12ext3 abc14 abc

5 bc12extparam=16 bc12ext7 bc18 bc

bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious

26

Purpose of computing decompositions

Memory saving

bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix

A more intricate example

Unsafe link Safe link

dbcebc

bcabc2

abc1abc

bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc

27

Privacy of Google Safe Browsing

ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy

Our goal A privacy analysis of Google and Yandex Safe Browsing

URL Prefix

wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3

bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set

28

Tracking Safe Browsing users

Our assumptions

bull Google and Yandex have incentives to behave maliciously

bull Wish to learn whether a user visits some selected URLs

How Google can track Safe Browsing users

bull Builds a list of prefixes to track

bull Includes these prefixes in the clientrsquos local cache

bull Learns from the requests whether a user visited a specific URL

bull Key parameter Anonymity-set size

29

Estimating anonymity-set size

bull Anonymity-set size of a prefix URLs that yield the prefix

Year URLs Domains

2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million

bull Estimate anonymity-set size Apply balls-into-bins model

Avg for URLs Avg for Domains

Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast

0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large

30

Sending multiple prefixes

Example with two prefixes

Decomposition Prefix

petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

Intuitively

bull Prefix for petsymposiumorg is not enough for re-identification

bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix

bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification

31

Ambiguity on two prefixes

bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache

URL Decomposition Prefix

Target URL abcabc A

bc B

Ambiguity

Type I gabcgabc C

abc A

bc B

Type II gbcgbc A larr collision on 32 bits

bc B

Type III defdef A larr collision on 32 bits

ef B larr collision on 32 bits

bull P[Type III] = 1264

bull Type II URLs exist only when decomp on the domain gt 232

bull P[Type I] gt P[Type II] gt P[Type III]

bull Mainly only Type I URLs create ambiguity in re-identification 32

How to track a given URL A real-world example (I)

petsymposiumorg2015

petsymposiumorg

petsymposiumorg2016linksphp

petsymposiumorg2016cfpphp

petsymposiumorg2016faqsphp

Type I URLs for target URL

petsymposiumorg2016

Goal Identify users interested in PETs

bull Target URL is petsymposiumorg2016

33

How to track a given URL A real-world example (II)

bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp

Decomposition Prefix

petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

bull Including 2 prefixes in the local cache rArr Anonymity set size of 4

bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes

bull Server receives 2 prefixes rArr visited page is the target URL

bull Server receives 3 prefixes rArr visited page is either of the leaf URLs

bull The third prefix decides which leaf URL was visited

bull Generalizable to any number of prefixes 34

Examples of URLs creating multiple hits

bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing

URL matching decomposition

Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf

17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp

httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues

1001cartesorgtag

Yandex

httpfrxhamstercomuservideofrxhamstercomxhamstercom

httpnlxhamstercomuservideonlxhamstercomxhamstercom

httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom

httpmmofoscomuserloginmmofoscommofoscom

httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom

teenslovehugecockscom

bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom

bull No need to add additional prefix for French or Dutch version35

Responsible disclosure and impact

bull Disclosure to Mozilla Firefox

ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox

bull Disclosure to Yandex

ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team

bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)

ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy

36

Conclusions amp Future Work

Conclusions

Lesson learnt Collisions are hard to tame in security and privacy

Bloom filters

bull Developers tend to ignore the worst-case of algorithms

bull Data structures with ad-hoc crypto primitives are at the best risky

bull Need of secure instantiations eg as in Count-Min sketches

Safe Browsing

bull Re-establish the weakness of anonymity-set privacy model

bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so

38

Future work

Bloom filters

bull On Bloom filters Bloom paradox [Rottenstreich 2015]

bull Beyond Bloom filters Security of Bloom filter variants

Safe Browsing

bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]

bull Privacy Can local cache improve Private Information Retrieval

39

Thank you

40

Other works not covered today

bull Performance of cryptographic accumulators

bull Private password auditing

bull (In)Security of Google and Yandex Safe Browsing

bull Alerting websites Risks and solutions

bull Decompression quines and anti-viruses

bull Pitfalls of hashing for privacy

bull Linkable (zero-knowledge) proofs for private and accountable gossip

41

Are collisions always bad (II)

Use case in privacy

bull Hashing as a pseudonymization technique

bull If ` is large but identifiers n is enumerable (in reasonable time)bull Exhaustive search breaks pseudonymization

d

d1

d2

` is large ` is small

h

h

h

h

bull If ` is sufficiently smallbull On average n2` identifiers share the same pseudonymbull Notion of anonymity-setbull Caveat Provides weak anonymity guaranteesbull Employed in Google Safe Browsing a malicious URL detection tool 4

Contrasting perspectives and outline

Contrasting perspectives

bull Collisions have to be absolutely avoided in cryptography

bull Somewhat welcome in algorithms and data structures

bull Useful to some extent in the context of privacy

Goal Investigate the security and privacy implications of hash collisions

Focus for today

bull Security Bloom Filters1 The Power of Evil Choices in Bloom Filters DSNrsquo15

Joint work with T Gerbet and C Lauradoux2 Bloom Filters in Adversarial Settings Under submission

Joint work with C Lauradoux and P Lafourcade

bull Privacy Safe Browsing1 A Privacy Analysis of Google and Yandex Safe Browsing DSNrsquo16

Joint work with T Gerbet and C Lauradoux

5

Security Bloom Filters

Bloom filters [Bloom 1970]

Setup(m n k)

bull A binary vector ~z of size m compressing a set of n itemsbull k uniform and independent hash functions hi 0 1lowast rarr [0m minus 1]bull ~z initialized to ~0

Operations

bull Insert(x) Set bits of ~z at h1(x) hk(x) to 1

0 0 0 1 1 0 1 1 0 1

S = x1 x2 x3 k = 2

y1 isin S y2 = x2 y3 isin S(false positive)

bull Query(y) Return True if bits of ~z at h1(y) hk(y) are all 1bull False positive rate and its optimum value have been well studied 7

Our contributions

bull Define adversary models for Bloom filtersbull Query-only adversarybull Chosen-insertion adversarybull Deletion adversary

bull Specific to counting Bloom filters (not covered today)

bull DoS attacks on Bloom enabled software applicationsbull Increase false positive probabilitybull Increase query time

bull Worst-case analysis of Bloom filtersbull false-positive probabilitybull new filter parameters

bull Bloom hash tables as a potential replacement for Bloom filters

8

Query-only adversary

Capabilities Only queries to the filterAssumption State of the filter is known

Goals

bull Craft items that generate false positivesbull Probability to forge a false positive is

(wH (~z)

m

)k

wH(middot) is the Hamming weight

bull Or items whose processing leads to latencybull First k minus 1 bits are set to 1 and the k-th bit set to 0

1 0 1 0 1 1 0 1 1 0 1 0

y k = 3

bull The probability of finding such an item is

(m minus wH(~z)) middot(wH (~z)kminus1

)mk

9

Chosen-insertion adversary

Capabilities Can choose items to insert in the filterAssumption State of the filter is knownGoal Increase the false positive probability

Strategy Greedily insert x that maximizes bits set to 1

bull Each x sets k bits to 1

0 0 0 1 0 1 1 1 1 1 1 1

x1 x2 x3 x4

Impact

No attack Under attackbits set to 1 072nkopt nkopt

false positive rate (f ) 12kopt

(nkoptm

)kopt

10

Impact on a sample filter

Parameters m = 3200 n = 600 kopt = 4 fopt = 0077

0 100 200 300 400 500 600Number of inserted items

False positive probability

0

007

014

021

028

035

f advPartialf

fopt

insert last200 items

11

Applying adversary models

Factors enabling our attacks

bull Insecure hash functions

bull Digest truncation

bull High Bloom filter false positive rate

Vulnerable software applications

Software app Hashing Parameter info

Scrapy Web crawler NA NADablooms Spam filter MurmurHash n = 100000 f = 0057Squid Web proxy MD5 f = 009 k = 4AIEngine NIDS C++ hash ` = 13 n = 5000 f = 045NSRL Forensic tool SHA-1 ` = 32 n asymp 14times 106 f = 808times 10minus10

sdhash Forensic tool SHA-1 ` = 11 n = 128 f = 00014

12

Bypassing a forensic tool

NSRL forensic tool

bull A whitelist of ldquoknown safe filesrdquo

bull Stored and distributed as a Bloom filter

bull Maintained by NIST

A query-only attack Goal is to hide a contraband file

bull Adversary modifies the file to create a false positive

bull Modification should be easily reversible

bull The filter detects the file as safe

13

Countermeasure against chosen-insertion attacks

Use worst-case parameters for Bloom filters

bull Fix m n and choose k that minimizes false positive probability

f adv =

(nk

m

)k

Optimal values are

kadvopt =

m

enand f adv

opt = eminusmen

bull Impact On a sample Bloom filter with m = 3200 n = 600bull Average case kopt = 4 fopt = 0077bull Worst case kadv

opt = 2 f advopt = 01

14

Summary of other attacks amp defenses

Attacks

Software app Attacks

Scrapy Web crawler chosen-insertion query-onlyDablooms Spam filter chosen-insertion deletionSquid Web proxy chosen-insertion query-onlyAIEngine NIDS query-onlysdhash Forensic tool query-only

Defenses

bull Use HMAC

bull Use an alternate data structure Bloom hash tables [Bloom 1970]bull Resists better to chosen-insertion attacksbull Is often more memory efficient than Bloom filtersbull On average O(k) hash computations for items not in the tablebull On average O(ln k) for items in the table

15

Related work

bull Algorithmic complexity attacks [Crosby et al 2003]bull DoS attacks against hash tablesbull Force hash tables to operate in O(n) instead of O(1)bull Similar attacks against skip-lists regular expressions etc

bull Independent work on Bloom filters [Naor et al 2015]bull Provide a theoretical frameworkbull Study a query-only adversary Can only adaptively query the filter

16

Privacy Safe Browsing

Google Safe Browsing in Mozilla Firefox

18

And many others

19

Adverted privacy policy

ldquoWe collect visited web pages clickstream data or web addressaccessed browser identifier and user IDrdquo mdash WOT

ldquocollects information including IP address the origin of thesearch and may share this info with a third partyrdquo mdash Norton

Many Safe Browsing services are privacy unfriendly by design

ldquocannot determine the real URL from the informationreceivedrdquo mdash Google

bull Google seems to provide the most private service

bull Hence focus of this work

20

Google Safe Browsing When Why and How

bull When In 2008 by Google

bull Goals Protect frombull Phishing sitesbull Malware sites

bull How Easy-to-use APIs in C Python and PHP

bull Methodology Blacklists

bull Available in

bull Impactbull Billions of usersbull Detects thousands of new malicious websites per day

bull Cloned by Yandex as Yandex Safe Browsing 21

Lookup API

bull Google harvests phishing and malware URLs to feed a blacklist

bull Client checks the status using a simple HTTP GETPOST requestsb-sslgooglecomsafebrowsingapilookupexamplecom

Issues

bull Does not scale Heavy network traffic

bull Privacy URLs are sent in clear

22

Improving privacy using a local cache

Database

Server ClientLocal cache

(3) Conditionalquery

(4) Answer

(1) Query

(2) Answer

Extended client

bull Communication with the server is reduced

bull Better privacy

23

Google Safe Browsing API (v3) Local cache

bull BlacklistsList name Description Entries

goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1

bull Does not handle URLs directly instead their SHA-256 digests

wwwevilcom SHA-256 cc7af8a31918

cc7af8a31918

32-bit prefix

bull Local cache contains prefixes 24

Clientrsquos behavior chart

Userrsquos input

URLCanonicalize andcompute digest

Foundprefix

Get fulldigests

Founddigest

MaliciousURL

Non-maliciousURL

yes

no

yes

no

25

Canonicalization and decompositions

bull Input URLhttpusrpwdabcport12extparam=1frags

bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password

bull Multiple decompositions are checked for a single URL

Decompositions of canonicalized URL

1 abc12extparam=12 abc12ext3 abc14 abc

5 bc12extparam=16 bc12ext7 bc18 bc

bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious

26

Purpose of computing decompositions

Memory saving

bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix

A more intricate example

Unsafe link Safe link

dbcebc

bcabc2

abc1abc

bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc

27

Privacy of Google Safe Browsing

ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy

Our goal A privacy analysis of Google and Yandex Safe Browsing

URL Prefix

wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3

bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set

28

Tracking Safe Browsing users

Our assumptions

bull Google and Yandex have incentives to behave maliciously

bull Wish to learn whether a user visits some selected URLs

How Google can track Safe Browsing users

bull Builds a list of prefixes to track

bull Includes these prefixes in the clientrsquos local cache

bull Learns from the requests whether a user visited a specific URL

bull Key parameter Anonymity-set size

29

Estimating anonymity-set size

bull Anonymity-set size of a prefix URLs that yield the prefix

Year URLs Domains

2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million

bull Estimate anonymity-set size Apply balls-into-bins model

Avg for URLs Avg for Domains

Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast

0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large

30

Sending multiple prefixes

Example with two prefixes

Decomposition Prefix

petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

Intuitively

bull Prefix for petsymposiumorg is not enough for re-identification

bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix

bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification

31

Ambiguity on two prefixes

bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache

URL Decomposition Prefix

Target URL abcabc A

bc B

Ambiguity

Type I gabcgabc C

abc A

bc B

Type II gbcgbc A larr collision on 32 bits

bc B

Type III defdef A larr collision on 32 bits

ef B larr collision on 32 bits

bull P[Type III] = 1264

bull Type II URLs exist only when decomp on the domain gt 232

bull P[Type I] gt P[Type II] gt P[Type III]

bull Mainly only Type I URLs create ambiguity in re-identification 32

How to track a given URL A real-world example (I)

petsymposiumorg2015

petsymposiumorg

petsymposiumorg2016linksphp

petsymposiumorg2016cfpphp

petsymposiumorg2016faqsphp

Type I URLs for target URL

petsymposiumorg2016

Goal Identify users interested in PETs

bull Target URL is petsymposiumorg2016

33

How to track a given URL A real-world example (II)

bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp

Decomposition Prefix

petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

bull Including 2 prefixes in the local cache rArr Anonymity set size of 4

bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes

bull Server receives 2 prefixes rArr visited page is the target URL

bull Server receives 3 prefixes rArr visited page is either of the leaf URLs

bull The third prefix decides which leaf URL was visited

bull Generalizable to any number of prefixes 34

Examples of URLs creating multiple hits

bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing

URL matching decomposition

Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf

17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp

httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues

1001cartesorgtag

Yandex

httpfrxhamstercomuservideofrxhamstercomxhamstercom

httpnlxhamstercomuservideonlxhamstercomxhamstercom

httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom

httpmmofoscomuserloginmmofoscommofoscom

httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom

teenslovehugecockscom

bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom

bull No need to add additional prefix for French or Dutch version35

Responsible disclosure and impact

bull Disclosure to Mozilla Firefox

ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox

bull Disclosure to Yandex

ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team

bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)

ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy

36

Conclusions amp Future Work

Conclusions

Lesson learnt Collisions are hard to tame in security and privacy

Bloom filters

bull Developers tend to ignore the worst-case of algorithms

bull Data structures with ad-hoc crypto primitives are at the best risky

bull Need of secure instantiations eg as in Count-Min sketches

Safe Browsing

bull Re-establish the weakness of anonymity-set privacy model

bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so

38

Future work

Bloom filters

bull On Bloom filters Bloom paradox [Rottenstreich 2015]

bull Beyond Bloom filters Security of Bloom filter variants

Safe Browsing

bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]

bull Privacy Can local cache improve Private Information Retrieval

39

Thank you

40

Other works not covered today

bull Performance of cryptographic accumulators

bull Private password auditing

bull (In)Security of Google and Yandex Safe Browsing

bull Alerting websites Risks and solutions

bull Decompression quines and anti-viruses

bull Pitfalls of hashing for privacy

bull Linkable (zero-knowledge) proofs for private and accountable gossip

41

Contrasting perspectives and outline

Contrasting perspectives

bull Collisions have to be absolutely avoided in cryptography

bull Somewhat welcome in algorithms and data structures

bull Useful to some extent in the context of privacy

Goal Investigate the security and privacy implications of hash collisions

Focus for today

bull Security Bloom Filters1 The Power of Evil Choices in Bloom Filters DSNrsquo15

Joint work with T Gerbet and C Lauradoux2 Bloom Filters in Adversarial Settings Under submission

Joint work with C Lauradoux and P Lafourcade

bull Privacy Safe Browsing1 A Privacy Analysis of Google and Yandex Safe Browsing DSNrsquo16

Joint work with T Gerbet and C Lauradoux

5

Security Bloom Filters

Bloom filters [Bloom 1970]

Setup(m n k)

bull A binary vector ~z of size m compressing a set of n itemsbull k uniform and independent hash functions hi 0 1lowast rarr [0m minus 1]bull ~z initialized to ~0

Operations

bull Insert(x) Set bits of ~z at h1(x) hk(x) to 1

0 0 0 1 1 0 1 1 0 1

S = x1 x2 x3 k = 2

y1 isin S y2 = x2 y3 isin S(false positive)

bull Query(y) Return True if bits of ~z at h1(y) hk(y) are all 1bull False positive rate and its optimum value have been well studied 7

Our contributions

bull Define adversary models for Bloom filtersbull Query-only adversarybull Chosen-insertion adversarybull Deletion adversary

bull Specific to counting Bloom filters (not covered today)

bull DoS attacks on Bloom enabled software applicationsbull Increase false positive probabilitybull Increase query time

bull Worst-case analysis of Bloom filtersbull false-positive probabilitybull new filter parameters

bull Bloom hash tables as a potential replacement for Bloom filters

8

Query-only adversary

Capabilities Only queries to the filterAssumption State of the filter is known

Goals

bull Craft items that generate false positivesbull Probability to forge a false positive is

(wH (~z)

m

)k

wH(middot) is the Hamming weight

bull Or items whose processing leads to latencybull First k minus 1 bits are set to 1 and the k-th bit set to 0

1 0 1 0 1 1 0 1 1 0 1 0

y k = 3

bull The probability of finding such an item is

(m minus wH(~z)) middot(wH (~z)kminus1

)mk

9

Chosen-insertion adversary

Capabilities Can choose items to insert in the filterAssumption State of the filter is knownGoal Increase the false positive probability

Strategy Greedily insert x that maximizes bits set to 1

bull Each x sets k bits to 1

0 0 0 1 0 1 1 1 1 1 1 1

x1 x2 x3 x4

Impact

No attack Under attackbits set to 1 072nkopt nkopt

false positive rate (f ) 12kopt

(nkoptm

)kopt

10

Impact on a sample filter

Parameters m = 3200 n = 600 kopt = 4 fopt = 0077

0 100 200 300 400 500 600Number of inserted items

False positive probability

0

007

014

021

028

035

f advPartialf

fopt

insert last200 items

11

Applying adversary models

Factors enabling our attacks

bull Insecure hash functions

bull Digest truncation

bull High Bloom filter false positive rate

Vulnerable software applications

Software app Hashing Parameter info

Scrapy Web crawler NA NADablooms Spam filter MurmurHash n = 100000 f = 0057Squid Web proxy MD5 f = 009 k = 4AIEngine NIDS C++ hash ` = 13 n = 5000 f = 045NSRL Forensic tool SHA-1 ` = 32 n asymp 14times 106 f = 808times 10minus10

sdhash Forensic tool SHA-1 ` = 11 n = 128 f = 00014

12

Bypassing a forensic tool

NSRL forensic tool

bull A whitelist of ldquoknown safe filesrdquo

bull Stored and distributed as a Bloom filter

bull Maintained by NIST

A query-only attack Goal is to hide a contraband file

bull Adversary modifies the file to create a false positive

bull Modification should be easily reversible

bull The filter detects the file as safe

13

Countermeasure against chosen-insertion attacks

Use worst-case parameters for Bloom filters

bull Fix m n and choose k that minimizes false positive probability

f adv =

(nk

m

)k

Optimal values are

kadvopt =

m

enand f adv

opt = eminusmen

bull Impact On a sample Bloom filter with m = 3200 n = 600bull Average case kopt = 4 fopt = 0077bull Worst case kadv

opt = 2 f advopt = 01

14

Summary of other attacks amp defenses

Attacks

Software app Attacks

Scrapy Web crawler chosen-insertion query-onlyDablooms Spam filter chosen-insertion deletionSquid Web proxy chosen-insertion query-onlyAIEngine NIDS query-onlysdhash Forensic tool query-only

Defenses

bull Use HMAC

bull Use an alternate data structure Bloom hash tables [Bloom 1970]bull Resists better to chosen-insertion attacksbull Is often more memory efficient than Bloom filtersbull On average O(k) hash computations for items not in the tablebull On average O(ln k) for items in the table

15

Related work

bull Algorithmic complexity attacks [Crosby et al 2003]bull DoS attacks against hash tablesbull Force hash tables to operate in O(n) instead of O(1)bull Similar attacks against skip-lists regular expressions etc

bull Independent work on Bloom filters [Naor et al 2015]bull Provide a theoretical frameworkbull Study a query-only adversary Can only adaptively query the filter

16

Privacy Safe Browsing

Google Safe Browsing in Mozilla Firefox

18

And many others

19

Adverted privacy policy

ldquoWe collect visited web pages clickstream data or web addressaccessed browser identifier and user IDrdquo mdash WOT

ldquocollects information including IP address the origin of thesearch and may share this info with a third partyrdquo mdash Norton

Many Safe Browsing services are privacy unfriendly by design

ldquocannot determine the real URL from the informationreceivedrdquo mdash Google

bull Google seems to provide the most private service

bull Hence focus of this work

20

Google Safe Browsing When Why and How

bull When In 2008 by Google

bull Goals Protect frombull Phishing sitesbull Malware sites

bull How Easy-to-use APIs in C Python and PHP

bull Methodology Blacklists

bull Available in

bull Impactbull Billions of usersbull Detects thousands of new malicious websites per day

bull Cloned by Yandex as Yandex Safe Browsing 21

Lookup API

bull Google harvests phishing and malware URLs to feed a blacklist

bull Client checks the status using a simple HTTP GETPOST requestsb-sslgooglecomsafebrowsingapilookupexamplecom

Issues

bull Does not scale Heavy network traffic

bull Privacy URLs are sent in clear

22

Improving privacy using a local cache

Database

Server ClientLocal cache

(3) Conditionalquery

(4) Answer

(1) Query

(2) Answer

Extended client

bull Communication with the server is reduced

bull Better privacy

23

Google Safe Browsing API (v3) Local cache

bull BlacklistsList name Description Entries

goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1

bull Does not handle URLs directly instead their SHA-256 digests

wwwevilcom SHA-256 cc7af8a31918

cc7af8a31918

32-bit prefix

bull Local cache contains prefixes 24

Clientrsquos behavior chart

Userrsquos input

URLCanonicalize andcompute digest

Foundprefix

Get fulldigests

Founddigest

MaliciousURL

Non-maliciousURL

yes

no

yes

no

25

Canonicalization and decompositions

bull Input URLhttpusrpwdabcport12extparam=1frags

bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password

bull Multiple decompositions are checked for a single URL

Decompositions of canonicalized URL

1 abc12extparam=12 abc12ext3 abc14 abc

5 bc12extparam=16 bc12ext7 bc18 bc

bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious

26

Purpose of computing decompositions

Memory saving

bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix

A more intricate example

Unsafe link Safe link

dbcebc

bcabc2

abc1abc

bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc

27

Privacy of Google Safe Browsing

ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy

Our goal A privacy analysis of Google and Yandex Safe Browsing

URL Prefix

wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3

bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set

28

Tracking Safe Browsing users

Our assumptions

bull Google and Yandex have incentives to behave maliciously

bull Wish to learn whether a user visits some selected URLs

How Google can track Safe Browsing users

bull Builds a list of prefixes to track

bull Includes these prefixes in the clientrsquos local cache

bull Learns from the requests whether a user visited a specific URL

bull Key parameter Anonymity-set size

29

Estimating anonymity-set size

bull Anonymity-set size of a prefix URLs that yield the prefix

Year URLs Domains

2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million

bull Estimate anonymity-set size Apply balls-into-bins model

Avg for URLs Avg for Domains

Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast

0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large

30

Sending multiple prefixes

Example with two prefixes

Decomposition Prefix

petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

Intuitively

bull Prefix for petsymposiumorg is not enough for re-identification

bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix

bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification

31

Ambiguity on two prefixes

bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache

URL Decomposition Prefix

Target URL abcabc A

bc B

Ambiguity

Type I gabcgabc C

abc A

bc B

Type II gbcgbc A larr collision on 32 bits

bc B

Type III defdef A larr collision on 32 bits

ef B larr collision on 32 bits

bull P[Type III] = 1264

bull Type II URLs exist only when decomp on the domain gt 232

bull P[Type I] gt P[Type II] gt P[Type III]

bull Mainly only Type I URLs create ambiguity in re-identification 32

How to track a given URL A real-world example (I)

petsymposiumorg2015

petsymposiumorg

petsymposiumorg2016linksphp

petsymposiumorg2016cfpphp

petsymposiumorg2016faqsphp

Type I URLs for target URL

petsymposiumorg2016

Goal Identify users interested in PETs

bull Target URL is petsymposiumorg2016

33

How to track a given URL A real-world example (II)

bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp

Decomposition Prefix

petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

bull Including 2 prefixes in the local cache rArr Anonymity set size of 4

bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes

bull Server receives 2 prefixes rArr visited page is the target URL

bull Server receives 3 prefixes rArr visited page is either of the leaf URLs

bull The third prefix decides which leaf URL was visited

bull Generalizable to any number of prefixes 34

Examples of URLs creating multiple hits

bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing

URL matching decomposition

Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf

17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp

httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues

1001cartesorgtag

Yandex

httpfrxhamstercomuservideofrxhamstercomxhamstercom

httpnlxhamstercomuservideonlxhamstercomxhamstercom

httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom

httpmmofoscomuserloginmmofoscommofoscom

httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom

teenslovehugecockscom

bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom

bull No need to add additional prefix for French or Dutch version35

Responsible disclosure and impact

bull Disclosure to Mozilla Firefox

ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox

bull Disclosure to Yandex

ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team

bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)

ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy

36

Conclusions amp Future Work

Conclusions

Lesson learnt Collisions are hard to tame in security and privacy

Bloom filters

bull Developers tend to ignore the worst-case of algorithms

bull Data structures with ad-hoc crypto primitives are at the best risky

bull Need of secure instantiations eg as in Count-Min sketches

Safe Browsing

bull Re-establish the weakness of anonymity-set privacy model

bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so

38

Future work

Bloom filters

bull On Bloom filters Bloom paradox [Rottenstreich 2015]

bull Beyond Bloom filters Security of Bloom filter variants

Safe Browsing

bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]

bull Privacy Can local cache improve Private Information Retrieval

39

Thank you

40

Other works not covered today

bull Performance of cryptographic accumulators

bull Private password auditing

bull (In)Security of Google and Yandex Safe Browsing

bull Alerting websites Risks and solutions

bull Decompression quines and anti-viruses

bull Pitfalls of hashing for privacy

bull Linkable (zero-knowledge) proofs for private and accountable gossip

41

Security Bloom Filters

Bloom filters [Bloom 1970]

Setup(m n k)

bull A binary vector ~z of size m compressing a set of n itemsbull k uniform and independent hash functions hi 0 1lowast rarr [0m minus 1]bull ~z initialized to ~0

Operations

bull Insert(x) Set bits of ~z at h1(x) hk(x) to 1

0 0 0 1 1 0 1 1 0 1

S = x1 x2 x3 k = 2

y1 isin S y2 = x2 y3 isin S(false positive)

bull Query(y) Return True if bits of ~z at h1(y) hk(y) are all 1bull False positive rate and its optimum value have been well studied 7

Our contributions

bull Define adversary models for Bloom filtersbull Query-only adversarybull Chosen-insertion adversarybull Deletion adversary

bull Specific to counting Bloom filters (not covered today)

bull DoS attacks on Bloom enabled software applicationsbull Increase false positive probabilitybull Increase query time

bull Worst-case analysis of Bloom filtersbull false-positive probabilitybull new filter parameters

bull Bloom hash tables as a potential replacement for Bloom filters

8

Query-only adversary

Capabilities Only queries to the filterAssumption State of the filter is known

Goals

bull Craft items that generate false positivesbull Probability to forge a false positive is

(wH (~z)

m

)k

wH(middot) is the Hamming weight

bull Or items whose processing leads to latencybull First k minus 1 bits are set to 1 and the k-th bit set to 0

1 0 1 0 1 1 0 1 1 0 1 0

y k = 3

bull The probability of finding such an item is

(m minus wH(~z)) middot(wH (~z)kminus1

)mk

9

Chosen-insertion adversary

Capabilities Can choose items to insert in the filterAssumption State of the filter is knownGoal Increase the false positive probability

Strategy Greedily insert x that maximizes bits set to 1

bull Each x sets k bits to 1

0 0 0 1 0 1 1 1 1 1 1 1

x1 x2 x3 x4

Impact

No attack Under attackbits set to 1 072nkopt nkopt

false positive rate (f ) 12kopt

(nkoptm

)kopt

10

Impact on a sample filter

Parameters m = 3200 n = 600 kopt = 4 fopt = 0077

0 100 200 300 400 500 600Number of inserted items

False positive probability

0

007

014

021

028

035

f advPartialf

fopt

insert last200 items

11

Applying adversary models

Factors enabling our attacks

bull Insecure hash functions

bull Digest truncation

bull High Bloom filter false positive rate

Vulnerable software applications

Software app Hashing Parameter info

Scrapy Web crawler NA NADablooms Spam filter MurmurHash n = 100000 f = 0057Squid Web proxy MD5 f = 009 k = 4AIEngine NIDS C++ hash ` = 13 n = 5000 f = 045NSRL Forensic tool SHA-1 ` = 32 n asymp 14times 106 f = 808times 10minus10

sdhash Forensic tool SHA-1 ` = 11 n = 128 f = 00014

12

Bypassing a forensic tool

NSRL forensic tool

bull A whitelist of ldquoknown safe filesrdquo

bull Stored and distributed as a Bloom filter

bull Maintained by NIST

A query-only attack Goal is to hide a contraband file

bull Adversary modifies the file to create a false positive

bull Modification should be easily reversible

bull The filter detects the file as safe

13

Countermeasure against chosen-insertion attacks

Use worst-case parameters for Bloom filters

bull Fix m n and choose k that minimizes false positive probability

f adv =

(nk

m

)k

Optimal values are

kadvopt =

m

enand f adv

opt = eminusmen

bull Impact On a sample Bloom filter with m = 3200 n = 600bull Average case kopt = 4 fopt = 0077bull Worst case kadv

opt = 2 f advopt = 01

14

Summary of other attacks amp defenses

Attacks

Software app Attacks

Scrapy Web crawler chosen-insertion query-onlyDablooms Spam filter chosen-insertion deletionSquid Web proxy chosen-insertion query-onlyAIEngine NIDS query-onlysdhash Forensic tool query-only

Defenses

bull Use HMAC

bull Use an alternate data structure Bloom hash tables [Bloom 1970]bull Resists better to chosen-insertion attacksbull Is often more memory efficient than Bloom filtersbull On average O(k) hash computations for items not in the tablebull On average O(ln k) for items in the table

15

Related work

bull Algorithmic complexity attacks [Crosby et al 2003]bull DoS attacks against hash tablesbull Force hash tables to operate in O(n) instead of O(1)bull Similar attacks against skip-lists regular expressions etc

bull Independent work on Bloom filters [Naor et al 2015]bull Provide a theoretical frameworkbull Study a query-only adversary Can only adaptively query the filter

16

Privacy Safe Browsing

Google Safe Browsing in Mozilla Firefox

18

And many others

19

Adverted privacy policy

ldquoWe collect visited web pages clickstream data or web addressaccessed browser identifier and user IDrdquo mdash WOT

ldquocollects information including IP address the origin of thesearch and may share this info with a third partyrdquo mdash Norton

Many Safe Browsing services are privacy unfriendly by design

ldquocannot determine the real URL from the informationreceivedrdquo mdash Google

bull Google seems to provide the most private service

bull Hence focus of this work

20

Google Safe Browsing When Why and How

bull When In 2008 by Google

bull Goals Protect frombull Phishing sitesbull Malware sites

bull How Easy-to-use APIs in C Python and PHP

bull Methodology Blacklists

bull Available in

bull Impactbull Billions of usersbull Detects thousands of new malicious websites per day

bull Cloned by Yandex as Yandex Safe Browsing 21

Lookup API

bull Google harvests phishing and malware URLs to feed a blacklist

bull Client checks the status using a simple HTTP GETPOST requestsb-sslgooglecomsafebrowsingapilookupexamplecom

Issues

bull Does not scale Heavy network traffic

bull Privacy URLs are sent in clear

22

Improving privacy using a local cache

Database

Server ClientLocal cache

(3) Conditionalquery

(4) Answer

(1) Query

(2) Answer

Extended client

bull Communication with the server is reduced

bull Better privacy

23

Google Safe Browsing API (v3) Local cache

bull BlacklistsList name Description Entries

goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1

bull Does not handle URLs directly instead their SHA-256 digests

wwwevilcom SHA-256 cc7af8a31918

cc7af8a31918

32-bit prefix

bull Local cache contains prefixes 24

Clientrsquos behavior chart

Userrsquos input

URLCanonicalize andcompute digest

Foundprefix

Get fulldigests

Founddigest

MaliciousURL

Non-maliciousURL

yes

no

yes

no

25

Canonicalization and decompositions

bull Input URLhttpusrpwdabcport12extparam=1frags

bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password

bull Multiple decompositions are checked for a single URL

Decompositions of canonicalized URL

1 abc12extparam=12 abc12ext3 abc14 abc

5 bc12extparam=16 bc12ext7 bc18 bc

bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious

26

Purpose of computing decompositions

Memory saving

bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix

A more intricate example

Unsafe link Safe link

dbcebc

bcabc2

abc1abc

bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc

27

Privacy of Google Safe Browsing

ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy

Our goal A privacy analysis of Google and Yandex Safe Browsing

URL Prefix

wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3

bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set

28

Tracking Safe Browsing users

Our assumptions

bull Google and Yandex have incentives to behave maliciously

bull Wish to learn whether a user visits some selected URLs

How Google can track Safe Browsing users

bull Builds a list of prefixes to track

bull Includes these prefixes in the clientrsquos local cache

bull Learns from the requests whether a user visited a specific URL

bull Key parameter Anonymity-set size

29

Estimating anonymity-set size

bull Anonymity-set size of a prefix URLs that yield the prefix

Year URLs Domains

2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million

bull Estimate anonymity-set size Apply balls-into-bins model

Avg for URLs Avg for Domains

Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast

0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large

30

Sending multiple prefixes

Example with two prefixes

Decomposition Prefix

petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

Intuitively

bull Prefix for petsymposiumorg is not enough for re-identification

bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix

bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification

31

Ambiguity on two prefixes

bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache

URL Decomposition Prefix

Target URL abcabc A

bc B

Ambiguity

Type I gabcgabc C

abc A

bc B

Type II gbcgbc A larr collision on 32 bits

bc B

Type III defdef A larr collision on 32 bits

ef B larr collision on 32 bits

bull P[Type III] = 1264

bull Type II URLs exist only when decomp on the domain gt 232

bull P[Type I] gt P[Type II] gt P[Type III]

bull Mainly only Type I URLs create ambiguity in re-identification 32

How to track a given URL A real-world example (I)

petsymposiumorg2015

petsymposiumorg

petsymposiumorg2016linksphp

petsymposiumorg2016cfpphp

petsymposiumorg2016faqsphp

Type I URLs for target URL

petsymposiumorg2016

Goal Identify users interested in PETs

bull Target URL is petsymposiumorg2016

33

How to track a given URL A real-world example (II)

bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp

Decomposition Prefix

petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

bull Including 2 prefixes in the local cache rArr Anonymity set size of 4

bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes

bull Server receives 2 prefixes rArr visited page is the target URL

bull Server receives 3 prefixes rArr visited page is either of the leaf URLs

bull The third prefix decides which leaf URL was visited

bull Generalizable to any number of prefixes 34

Examples of URLs creating multiple hits

bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing

URL matching decomposition

Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf

17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp

httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues

1001cartesorgtag

Yandex

httpfrxhamstercomuservideofrxhamstercomxhamstercom

httpnlxhamstercomuservideonlxhamstercomxhamstercom

httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom

httpmmofoscomuserloginmmofoscommofoscom

httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom

teenslovehugecockscom

bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom

bull No need to add additional prefix for French or Dutch version35

Responsible disclosure and impact

bull Disclosure to Mozilla Firefox

ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox

bull Disclosure to Yandex

ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team

bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)

ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy

36

Conclusions amp Future Work

Conclusions

Lesson learnt Collisions are hard to tame in security and privacy

Bloom filters

bull Developers tend to ignore the worst-case of algorithms

bull Data structures with ad-hoc crypto primitives are at the best risky

bull Need of secure instantiations eg as in Count-Min sketches

Safe Browsing

bull Re-establish the weakness of anonymity-set privacy model

bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so

38

Future work

Bloom filters

bull On Bloom filters Bloom paradox [Rottenstreich 2015]

bull Beyond Bloom filters Security of Bloom filter variants

Safe Browsing

bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]

bull Privacy Can local cache improve Private Information Retrieval

39

Thank you

40

Other works not covered today

bull Performance of cryptographic accumulators

bull Private password auditing

bull (In)Security of Google and Yandex Safe Browsing

bull Alerting websites Risks and solutions

bull Decompression quines and anti-viruses

bull Pitfalls of hashing for privacy

bull Linkable (zero-knowledge) proofs for private and accountable gossip

41

Bloom filters [Bloom 1970]

Setup(m n k)

bull A binary vector ~z of size m compressing a set of n itemsbull k uniform and independent hash functions hi 0 1lowast rarr [0m minus 1]bull ~z initialized to ~0

Operations

bull Insert(x) Set bits of ~z at h1(x) hk(x) to 1

0 0 0 1 1 0 1 1 0 1

S = x1 x2 x3 k = 2

y1 isin S y2 = x2 y3 isin S(false positive)

bull Query(y) Return True if bits of ~z at h1(y) hk(y) are all 1bull False positive rate and its optimum value have been well studied 7

Our contributions

bull Define adversary models for Bloom filtersbull Query-only adversarybull Chosen-insertion adversarybull Deletion adversary

bull Specific to counting Bloom filters (not covered today)

bull DoS attacks on Bloom enabled software applicationsbull Increase false positive probabilitybull Increase query time

bull Worst-case analysis of Bloom filtersbull false-positive probabilitybull new filter parameters

bull Bloom hash tables as a potential replacement for Bloom filters

8

Query-only adversary

Capabilities Only queries to the filterAssumption State of the filter is known

Goals

bull Craft items that generate false positivesbull Probability to forge a false positive is

(wH (~z)

m

)k

wH(middot) is the Hamming weight

bull Or items whose processing leads to latencybull First k minus 1 bits are set to 1 and the k-th bit set to 0

1 0 1 0 1 1 0 1 1 0 1 0

y k = 3

bull The probability of finding such an item is

(m minus wH(~z)) middot(wH (~z)kminus1

)mk

9

Chosen-insertion adversary

Capabilities Can choose items to insert in the filterAssumption State of the filter is knownGoal Increase the false positive probability

Strategy Greedily insert x that maximizes bits set to 1

bull Each x sets k bits to 1

0 0 0 1 0 1 1 1 1 1 1 1

x1 x2 x3 x4

Impact

No attack Under attackbits set to 1 072nkopt nkopt

false positive rate (f ) 12kopt

(nkoptm

)kopt

10

Impact on a sample filter

Parameters m = 3200 n = 600 kopt = 4 fopt = 0077

0 100 200 300 400 500 600Number of inserted items

False positive probability

0

007

014

021

028

035

f advPartialf

fopt

insert last200 items

11

Applying adversary models

Factors enabling our attacks

bull Insecure hash functions

bull Digest truncation

bull High Bloom filter false positive rate

Vulnerable software applications

Software app Hashing Parameter info

Scrapy Web crawler NA NADablooms Spam filter MurmurHash n = 100000 f = 0057Squid Web proxy MD5 f = 009 k = 4AIEngine NIDS C++ hash ` = 13 n = 5000 f = 045NSRL Forensic tool SHA-1 ` = 32 n asymp 14times 106 f = 808times 10minus10

sdhash Forensic tool SHA-1 ` = 11 n = 128 f = 00014

12

Bypassing a forensic tool

NSRL forensic tool

bull A whitelist of ldquoknown safe filesrdquo

bull Stored and distributed as a Bloom filter

bull Maintained by NIST

A query-only attack Goal is to hide a contraband file

bull Adversary modifies the file to create a false positive

bull Modification should be easily reversible

bull The filter detects the file as safe

13

Countermeasure against chosen-insertion attacks

Use worst-case parameters for Bloom filters

bull Fix m n and choose k that minimizes false positive probability

f adv =

(nk

m

)k

Optimal values are

kadvopt =

m

enand f adv

opt = eminusmen

bull Impact On a sample Bloom filter with m = 3200 n = 600bull Average case kopt = 4 fopt = 0077bull Worst case kadv

opt = 2 f advopt = 01

14

Summary of other attacks amp defenses

Attacks

Software app Attacks

Scrapy Web crawler chosen-insertion query-onlyDablooms Spam filter chosen-insertion deletionSquid Web proxy chosen-insertion query-onlyAIEngine NIDS query-onlysdhash Forensic tool query-only

Defenses

bull Use HMAC

bull Use an alternate data structure Bloom hash tables [Bloom 1970]bull Resists better to chosen-insertion attacksbull Is often more memory efficient than Bloom filtersbull On average O(k) hash computations for items not in the tablebull On average O(ln k) for items in the table

15

Related work

bull Algorithmic complexity attacks [Crosby et al 2003]bull DoS attacks against hash tablesbull Force hash tables to operate in O(n) instead of O(1)bull Similar attacks against skip-lists regular expressions etc

bull Independent work on Bloom filters [Naor et al 2015]bull Provide a theoretical frameworkbull Study a query-only adversary Can only adaptively query the filter

16

Privacy Safe Browsing

Google Safe Browsing in Mozilla Firefox

18

And many others

19

Adverted privacy policy

ldquoWe collect visited web pages clickstream data or web addressaccessed browser identifier and user IDrdquo mdash WOT

ldquocollects information including IP address the origin of thesearch and may share this info with a third partyrdquo mdash Norton

Many Safe Browsing services are privacy unfriendly by design

ldquocannot determine the real URL from the informationreceivedrdquo mdash Google

bull Google seems to provide the most private service

bull Hence focus of this work

20

Google Safe Browsing When Why and How

bull When In 2008 by Google

bull Goals Protect frombull Phishing sitesbull Malware sites

bull How Easy-to-use APIs in C Python and PHP

bull Methodology Blacklists

bull Available in

bull Impactbull Billions of usersbull Detects thousands of new malicious websites per day

bull Cloned by Yandex as Yandex Safe Browsing 21

Lookup API

bull Google harvests phishing and malware URLs to feed a blacklist

bull Client checks the status using a simple HTTP GETPOST requestsb-sslgooglecomsafebrowsingapilookupexamplecom

Issues

bull Does not scale Heavy network traffic

bull Privacy URLs are sent in clear

22

Improving privacy using a local cache

Database

Server ClientLocal cache

(3) Conditionalquery

(4) Answer

(1) Query

(2) Answer

Extended client

bull Communication with the server is reduced

bull Better privacy

23

Google Safe Browsing API (v3) Local cache

bull BlacklistsList name Description Entries

goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1

bull Does not handle URLs directly instead their SHA-256 digests

wwwevilcom SHA-256 cc7af8a31918

cc7af8a31918

32-bit prefix

bull Local cache contains prefixes 24

Clientrsquos behavior chart

Userrsquos input

URLCanonicalize andcompute digest

Foundprefix

Get fulldigests

Founddigest

MaliciousURL

Non-maliciousURL

yes

no

yes

no

25

Canonicalization and decompositions

bull Input URLhttpusrpwdabcport12extparam=1frags

bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password

bull Multiple decompositions are checked for a single URL

Decompositions of canonicalized URL

1 abc12extparam=12 abc12ext3 abc14 abc

5 bc12extparam=16 bc12ext7 bc18 bc

bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious

26

Purpose of computing decompositions

Memory saving

bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix

A more intricate example

Unsafe link Safe link

dbcebc

bcabc2

abc1abc

bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc

27

Privacy of Google Safe Browsing

ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy

Our goal A privacy analysis of Google and Yandex Safe Browsing

URL Prefix

wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3

bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set

28

Tracking Safe Browsing users

Our assumptions

bull Google and Yandex have incentives to behave maliciously

bull Wish to learn whether a user visits some selected URLs

How Google can track Safe Browsing users

bull Builds a list of prefixes to track

bull Includes these prefixes in the clientrsquos local cache

bull Learns from the requests whether a user visited a specific URL

bull Key parameter Anonymity-set size

29

Estimating anonymity-set size

bull Anonymity-set size of a prefix URLs that yield the prefix

Year URLs Domains

2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million

bull Estimate anonymity-set size Apply balls-into-bins model

Avg for URLs Avg for Domains

Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast

0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large

30

Sending multiple prefixes

Example with two prefixes

Decomposition Prefix

petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

Intuitively

bull Prefix for petsymposiumorg is not enough for re-identification

bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix

bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification

31

Ambiguity on two prefixes

bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache

URL Decomposition Prefix

Target URL abcabc A

bc B

Ambiguity

Type I gabcgabc C

abc A

bc B

Type II gbcgbc A larr collision on 32 bits

bc B

Type III defdef A larr collision on 32 bits

ef B larr collision on 32 bits

bull P[Type III] = 1264

bull Type II URLs exist only when decomp on the domain gt 232

bull P[Type I] gt P[Type II] gt P[Type III]

bull Mainly only Type I URLs create ambiguity in re-identification 32

How to track a given URL A real-world example (I)

petsymposiumorg2015

petsymposiumorg

petsymposiumorg2016linksphp

petsymposiumorg2016cfpphp

petsymposiumorg2016faqsphp

Type I URLs for target URL

petsymposiumorg2016

Goal Identify users interested in PETs

bull Target URL is petsymposiumorg2016

33

How to track a given URL A real-world example (II)

bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp

Decomposition Prefix

petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

bull Including 2 prefixes in the local cache rArr Anonymity set size of 4

bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes

bull Server receives 2 prefixes rArr visited page is the target URL

bull Server receives 3 prefixes rArr visited page is either of the leaf URLs

bull The third prefix decides which leaf URL was visited

bull Generalizable to any number of prefixes 34

Examples of URLs creating multiple hits

bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing

URL matching decomposition

Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf

17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp

httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues

1001cartesorgtag

Yandex

httpfrxhamstercomuservideofrxhamstercomxhamstercom

httpnlxhamstercomuservideonlxhamstercomxhamstercom

httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom

httpmmofoscomuserloginmmofoscommofoscom

httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom

teenslovehugecockscom

bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom

bull No need to add additional prefix for French or Dutch version35

Responsible disclosure and impact

bull Disclosure to Mozilla Firefox

ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox

bull Disclosure to Yandex

ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team

bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)

ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy

36

Conclusions amp Future Work

Conclusions

Lesson learnt Collisions are hard to tame in security and privacy

Bloom filters

bull Developers tend to ignore the worst-case of algorithms

bull Data structures with ad-hoc crypto primitives are at the best risky

bull Need of secure instantiations eg as in Count-Min sketches

Safe Browsing

bull Re-establish the weakness of anonymity-set privacy model

bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so

38

Future work

Bloom filters

bull On Bloom filters Bloom paradox [Rottenstreich 2015]

bull Beyond Bloom filters Security of Bloom filter variants

Safe Browsing

bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]

bull Privacy Can local cache improve Private Information Retrieval

39

Thank you

40

Other works not covered today

bull Performance of cryptographic accumulators

bull Private password auditing

bull (In)Security of Google and Yandex Safe Browsing

bull Alerting websites Risks and solutions

bull Decompression quines and anti-viruses

bull Pitfalls of hashing for privacy

bull Linkable (zero-knowledge) proofs for private and accountable gossip

41

Our contributions

bull Define adversary models for Bloom filtersbull Query-only adversarybull Chosen-insertion adversarybull Deletion adversary

bull Specific to counting Bloom filters (not covered today)

bull DoS attacks on Bloom enabled software applicationsbull Increase false positive probabilitybull Increase query time

bull Worst-case analysis of Bloom filtersbull false-positive probabilitybull new filter parameters

bull Bloom hash tables as a potential replacement for Bloom filters

8

Query-only adversary

Capabilities Only queries to the filterAssumption State of the filter is known

Goals

bull Craft items that generate false positivesbull Probability to forge a false positive is

(wH (~z)

m

)k

wH(middot) is the Hamming weight

bull Or items whose processing leads to latencybull First k minus 1 bits are set to 1 and the k-th bit set to 0

1 0 1 0 1 1 0 1 1 0 1 0

y k = 3

bull The probability of finding such an item is

(m minus wH(~z)) middot(wH (~z)kminus1

)mk

9

Chosen-insertion adversary

Capabilities Can choose items to insert in the filterAssumption State of the filter is knownGoal Increase the false positive probability

Strategy Greedily insert x that maximizes bits set to 1

bull Each x sets k bits to 1

0 0 0 1 0 1 1 1 1 1 1 1

x1 x2 x3 x4

Impact

No attack Under attackbits set to 1 072nkopt nkopt

false positive rate (f ) 12kopt

(nkoptm

)kopt

10

Impact on a sample filter

Parameters m = 3200 n = 600 kopt = 4 fopt = 0077

0 100 200 300 400 500 600Number of inserted items

False positive probability

0

007

014

021

028

035

f advPartialf

fopt

insert last200 items

11

Applying adversary models

Factors enabling our attacks

bull Insecure hash functions

bull Digest truncation

bull High Bloom filter false positive rate

Vulnerable software applications

Software app Hashing Parameter info

Scrapy Web crawler NA NADablooms Spam filter MurmurHash n = 100000 f = 0057Squid Web proxy MD5 f = 009 k = 4AIEngine NIDS C++ hash ` = 13 n = 5000 f = 045NSRL Forensic tool SHA-1 ` = 32 n asymp 14times 106 f = 808times 10minus10

sdhash Forensic tool SHA-1 ` = 11 n = 128 f = 00014

12

Bypassing a forensic tool

NSRL forensic tool

bull A whitelist of ldquoknown safe filesrdquo

bull Stored and distributed as a Bloom filter

bull Maintained by NIST

A query-only attack Goal is to hide a contraband file

bull Adversary modifies the file to create a false positive

bull Modification should be easily reversible

bull The filter detects the file as safe

13

Countermeasure against chosen-insertion attacks

Use worst-case parameters for Bloom filters

bull Fix m n and choose k that minimizes false positive probability

f adv =

(nk

m

)k

Optimal values are

kadvopt =

m

enand f adv

opt = eminusmen

bull Impact On a sample Bloom filter with m = 3200 n = 600bull Average case kopt = 4 fopt = 0077bull Worst case kadv

opt = 2 f advopt = 01

14

Summary of other attacks amp defenses

Attacks

Software app Attacks

Scrapy Web crawler chosen-insertion query-onlyDablooms Spam filter chosen-insertion deletionSquid Web proxy chosen-insertion query-onlyAIEngine NIDS query-onlysdhash Forensic tool query-only

Defenses

bull Use HMAC

bull Use an alternate data structure Bloom hash tables [Bloom 1970]bull Resists better to chosen-insertion attacksbull Is often more memory efficient than Bloom filtersbull On average O(k) hash computations for items not in the tablebull On average O(ln k) for items in the table

15

Related work

bull Algorithmic complexity attacks [Crosby et al 2003]bull DoS attacks against hash tablesbull Force hash tables to operate in O(n) instead of O(1)bull Similar attacks against skip-lists regular expressions etc

bull Independent work on Bloom filters [Naor et al 2015]bull Provide a theoretical frameworkbull Study a query-only adversary Can only adaptively query the filter

16

Privacy Safe Browsing

Google Safe Browsing in Mozilla Firefox

18

And many others

19

Adverted privacy policy

ldquoWe collect visited web pages clickstream data or web addressaccessed browser identifier and user IDrdquo mdash WOT

ldquocollects information including IP address the origin of thesearch and may share this info with a third partyrdquo mdash Norton

Many Safe Browsing services are privacy unfriendly by design

ldquocannot determine the real URL from the informationreceivedrdquo mdash Google

bull Google seems to provide the most private service

bull Hence focus of this work

20

Google Safe Browsing When Why and How

bull When In 2008 by Google

bull Goals Protect frombull Phishing sitesbull Malware sites

bull How Easy-to-use APIs in C Python and PHP

bull Methodology Blacklists

bull Available in

bull Impactbull Billions of usersbull Detects thousands of new malicious websites per day

bull Cloned by Yandex as Yandex Safe Browsing 21

Lookup API

bull Google harvests phishing and malware URLs to feed a blacklist

bull Client checks the status using a simple HTTP GETPOST requestsb-sslgooglecomsafebrowsingapilookupexamplecom

Issues

bull Does not scale Heavy network traffic

bull Privacy URLs are sent in clear

22

Improving privacy using a local cache

Database

Server ClientLocal cache

(3) Conditionalquery

(4) Answer

(1) Query

(2) Answer

Extended client

bull Communication with the server is reduced

bull Better privacy

23

Google Safe Browsing API (v3) Local cache

bull BlacklistsList name Description Entries

goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1

bull Does not handle URLs directly instead their SHA-256 digests

wwwevilcom SHA-256 cc7af8a31918

cc7af8a31918

32-bit prefix

bull Local cache contains prefixes 24

Clientrsquos behavior chart

Userrsquos input

URLCanonicalize andcompute digest

Foundprefix

Get fulldigests

Founddigest

MaliciousURL

Non-maliciousURL

yes

no

yes

no

25

Canonicalization and decompositions

bull Input URLhttpusrpwdabcport12extparam=1frags

bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password

bull Multiple decompositions are checked for a single URL

Decompositions of canonicalized URL

1 abc12extparam=12 abc12ext3 abc14 abc

5 bc12extparam=16 bc12ext7 bc18 bc

bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious

26

Purpose of computing decompositions

Memory saving

bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix

A more intricate example

Unsafe link Safe link

dbcebc

bcabc2

abc1abc

bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc

27

Privacy of Google Safe Browsing

ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy

Our goal A privacy analysis of Google and Yandex Safe Browsing

URL Prefix

wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3

bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set

28

Tracking Safe Browsing users

Our assumptions

bull Google and Yandex have incentives to behave maliciously

bull Wish to learn whether a user visits some selected URLs

How Google can track Safe Browsing users

bull Builds a list of prefixes to track

bull Includes these prefixes in the clientrsquos local cache

bull Learns from the requests whether a user visited a specific URL

bull Key parameter Anonymity-set size

29

Estimating anonymity-set size

bull Anonymity-set size of a prefix URLs that yield the prefix

Year URLs Domains

2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million

bull Estimate anonymity-set size Apply balls-into-bins model

Avg for URLs Avg for Domains

Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast

0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large

30

Sending multiple prefixes

Example with two prefixes

Decomposition Prefix

petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

Intuitively

bull Prefix for petsymposiumorg is not enough for re-identification

bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix

bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification

31

Ambiguity on two prefixes

bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache

URL Decomposition Prefix

Target URL abcabc A

bc B

Ambiguity

Type I gabcgabc C

abc A

bc B

Type II gbcgbc A larr collision on 32 bits

bc B

Type III defdef A larr collision on 32 bits

ef B larr collision on 32 bits

bull P[Type III] = 1264

bull Type II URLs exist only when decomp on the domain gt 232

bull P[Type I] gt P[Type II] gt P[Type III]

bull Mainly only Type I URLs create ambiguity in re-identification 32

How to track a given URL A real-world example (I)

petsymposiumorg2015

petsymposiumorg

petsymposiumorg2016linksphp

petsymposiumorg2016cfpphp

petsymposiumorg2016faqsphp

Type I URLs for target URL

petsymposiumorg2016

Goal Identify users interested in PETs

bull Target URL is petsymposiumorg2016

33

How to track a given URL A real-world example (II)

bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp

Decomposition Prefix

petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

bull Including 2 prefixes in the local cache rArr Anonymity set size of 4

bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes

bull Server receives 2 prefixes rArr visited page is the target URL

bull Server receives 3 prefixes rArr visited page is either of the leaf URLs

bull The third prefix decides which leaf URL was visited

bull Generalizable to any number of prefixes 34

Examples of URLs creating multiple hits

bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing

URL matching decomposition

Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf

17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp

httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues

1001cartesorgtag

Yandex

httpfrxhamstercomuservideofrxhamstercomxhamstercom

httpnlxhamstercomuservideonlxhamstercomxhamstercom

httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom

httpmmofoscomuserloginmmofoscommofoscom

httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom

teenslovehugecockscom

bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom

bull No need to add additional prefix for French or Dutch version35

Responsible disclosure and impact

bull Disclosure to Mozilla Firefox

ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox

bull Disclosure to Yandex

ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team

bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)

ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy

36

Conclusions amp Future Work

Conclusions

Lesson learnt Collisions are hard to tame in security and privacy

Bloom filters

bull Developers tend to ignore the worst-case of algorithms

bull Data structures with ad-hoc crypto primitives are at the best risky

bull Need of secure instantiations eg as in Count-Min sketches

Safe Browsing

bull Re-establish the weakness of anonymity-set privacy model

bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so

38

Future work

Bloom filters

bull On Bloom filters Bloom paradox [Rottenstreich 2015]

bull Beyond Bloom filters Security of Bloom filter variants

Safe Browsing

bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]

bull Privacy Can local cache improve Private Information Retrieval

39

Thank you

40

Other works not covered today

bull Performance of cryptographic accumulators

bull Private password auditing

bull (In)Security of Google and Yandex Safe Browsing

bull Alerting websites Risks and solutions

bull Decompression quines and anti-viruses

bull Pitfalls of hashing for privacy

bull Linkable (zero-knowledge) proofs for private and accountable gossip

41

Query-only adversary

Capabilities Only queries to the filterAssumption State of the filter is known

Goals

bull Craft items that generate false positivesbull Probability to forge a false positive is

(wH (~z)

m

)k

wH(middot) is the Hamming weight

bull Or items whose processing leads to latencybull First k minus 1 bits are set to 1 and the k-th bit set to 0

1 0 1 0 1 1 0 1 1 0 1 0

y k = 3

bull The probability of finding such an item is

(m minus wH(~z)) middot(wH (~z)kminus1

)mk

9

Chosen-insertion adversary

Capabilities Can choose items to insert in the filterAssumption State of the filter is knownGoal Increase the false positive probability

Strategy Greedily insert x that maximizes bits set to 1

bull Each x sets k bits to 1

0 0 0 1 0 1 1 1 1 1 1 1

x1 x2 x3 x4

Impact

No attack Under attackbits set to 1 072nkopt nkopt

false positive rate (f ) 12kopt

(nkoptm

)kopt

10

Impact on a sample filter

Parameters m = 3200 n = 600 kopt = 4 fopt = 0077

0 100 200 300 400 500 600Number of inserted items

False positive probability

0

007

014

021

028

035

f advPartialf

fopt

insert last200 items

11

Applying adversary models

Factors enabling our attacks

bull Insecure hash functions

bull Digest truncation

bull High Bloom filter false positive rate

Vulnerable software applications

Software app Hashing Parameter info

Scrapy Web crawler NA NADablooms Spam filter MurmurHash n = 100000 f = 0057Squid Web proxy MD5 f = 009 k = 4AIEngine NIDS C++ hash ` = 13 n = 5000 f = 045NSRL Forensic tool SHA-1 ` = 32 n asymp 14times 106 f = 808times 10minus10

sdhash Forensic tool SHA-1 ` = 11 n = 128 f = 00014

12

Bypassing a forensic tool

NSRL forensic tool

bull A whitelist of ldquoknown safe filesrdquo

bull Stored and distributed as a Bloom filter

bull Maintained by NIST

A query-only attack Goal is to hide a contraband file

bull Adversary modifies the file to create a false positive

bull Modification should be easily reversible

bull The filter detects the file as safe

13

Countermeasure against chosen-insertion attacks

Use worst-case parameters for Bloom filters

bull Fix m n and choose k that minimizes false positive probability

f adv =

(nk

m

)k

Optimal values are

kadvopt =

m

enand f adv

opt = eminusmen

bull Impact On a sample Bloom filter with m = 3200 n = 600bull Average case kopt = 4 fopt = 0077bull Worst case kadv

opt = 2 f advopt = 01

14

Summary of other attacks amp defenses

Attacks

Software app Attacks

Scrapy Web crawler chosen-insertion query-onlyDablooms Spam filter chosen-insertion deletionSquid Web proxy chosen-insertion query-onlyAIEngine NIDS query-onlysdhash Forensic tool query-only

Defenses

bull Use HMAC

bull Use an alternate data structure Bloom hash tables [Bloom 1970]bull Resists better to chosen-insertion attacksbull Is often more memory efficient than Bloom filtersbull On average O(k) hash computations for items not in the tablebull On average O(ln k) for items in the table

15

Related work

bull Algorithmic complexity attacks [Crosby et al 2003]bull DoS attacks against hash tablesbull Force hash tables to operate in O(n) instead of O(1)bull Similar attacks against skip-lists regular expressions etc

bull Independent work on Bloom filters [Naor et al 2015]bull Provide a theoretical frameworkbull Study a query-only adversary Can only adaptively query the filter

16

Privacy Safe Browsing

Google Safe Browsing in Mozilla Firefox

18

And many others

19

Adverted privacy policy

ldquoWe collect visited web pages clickstream data or web addressaccessed browser identifier and user IDrdquo mdash WOT

ldquocollects information including IP address the origin of thesearch and may share this info with a third partyrdquo mdash Norton

Many Safe Browsing services are privacy unfriendly by design

ldquocannot determine the real URL from the informationreceivedrdquo mdash Google

bull Google seems to provide the most private service

bull Hence focus of this work

20

Google Safe Browsing When Why and How

bull When In 2008 by Google

bull Goals Protect frombull Phishing sitesbull Malware sites

bull How Easy-to-use APIs in C Python and PHP

bull Methodology Blacklists

bull Available in

bull Impactbull Billions of usersbull Detects thousands of new malicious websites per day

bull Cloned by Yandex as Yandex Safe Browsing 21

Lookup API

bull Google harvests phishing and malware URLs to feed a blacklist

bull Client checks the status using a simple HTTP GETPOST requestsb-sslgooglecomsafebrowsingapilookupexamplecom

Issues

bull Does not scale Heavy network traffic

bull Privacy URLs are sent in clear

22

Improving privacy using a local cache

Database

Server ClientLocal cache

(3) Conditionalquery

(4) Answer

(1) Query

(2) Answer

Extended client

bull Communication with the server is reduced

bull Better privacy

23

Google Safe Browsing API (v3) Local cache

bull BlacklistsList name Description Entries

goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1

bull Does not handle URLs directly instead their SHA-256 digests

wwwevilcom SHA-256 cc7af8a31918

cc7af8a31918

32-bit prefix

bull Local cache contains prefixes 24

Clientrsquos behavior chart

Userrsquos input

URLCanonicalize andcompute digest

Foundprefix

Get fulldigests

Founddigest

MaliciousURL

Non-maliciousURL

yes

no

yes

no

25

Canonicalization and decompositions

bull Input URLhttpusrpwdabcport12extparam=1frags

bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password

bull Multiple decompositions are checked for a single URL

Decompositions of canonicalized URL

1 abc12extparam=12 abc12ext3 abc14 abc

5 bc12extparam=16 bc12ext7 bc18 bc

bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious

26

Purpose of computing decompositions

Memory saving

bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix

A more intricate example

Unsafe link Safe link

dbcebc

bcabc2

abc1abc

bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc

27

Privacy of Google Safe Browsing

ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy

Our goal A privacy analysis of Google and Yandex Safe Browsing

URL Prefix

wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3

bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set

28

Tracking Safe Browsing users

Our assumptions

bull Google and Yandex have incentives to behave maliciously

bull Wish to learn whether a user visits some selected URLs

How Google can track Safe Browsing users

bull Builds a list of prefixes to track

bull Includes these prefixes in the clientrsquos local cache

bull Learns from the requests whether a user visited a specific URL

bull Key parameter Anonymity-set size

29

Estimating anonymity-set size

bull Anonymity-set size of a prefix URLs that yield the prefix

Year URLs Domains

2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million

bull Estimate anonymity-set size Apply balls-into-bins model

Avg for URLs Avg for Domains

Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast

0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large

30

Sending multiple prefixes

Example with two prefixes

Decomposition Prefix

petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

Intuitively

bull Prefix for petsymposiumorg is not enough for re-identification

bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix

bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification

31

Ambiguity on two prefixes

bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache

URL Decomposition Prefix

Target URL abcabc A

bc B

Ambiguity

Type I gabcgabc C

abc A

bc B

Type II gbcgbc A larr collision on 32 bits

bc B

Type III defdef A larr collision on 32 bits

ef B larr collision on 32 bits

bull P[Type III] = 1264

bull Type II URLs exist only when decomp on the domain gt 232

bull P[Type I] gt P[Type II] gt P[Type III]

bull Mainly only Type I URLs create ambiguity in re-identification 32

How to track a given URL A real-world example (I)

petsymposiumorg2015

petsymposiumorg

petsymposiumorg2016linksphp

petsymposiumorg2016cfpphp

petsymposiumorg2016faqsphp

Type I URLs for target URL

petsymposiumorg2016

Goal Identify users interested in PETs

bull Target URL is petsymposiumorg2016

33

How to track a given URL A real-world example (II)

bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp

Decomposition Prefix

petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

bull Including 2 prefixes in the local cache rArr Anonymity set size of 4

bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes

bull Server receives 2 prefixes rArr visited page is the target URL

bull Server receives 3 prefixes rArr visited page is either of the leaf URLs

bull The third prefix decides which leaf URL was visited

bull Generalizable to any number of prefixes 34

Examples of URLs creating multiple hits

bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing

URL matching decomposition

Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf

17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp

httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues

1001cartesorgtag

Yandex

httpfrxhamstercomuservideofrxhamstercomxhamstercom

httpnlxhamstercomuservideonlxhamstercomxhamstercom

httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom

httpmmofoscomuserloginmmofoscommofoscom

httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom

teenslovehugecockscom

bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom

bull No need to add additional prefix for French or Dutch version35

Responsible disclosure and impact

bull Disclosure to Mozilla Firefox

ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox

bull Disclosure to Yandex

ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team

bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)

ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy

36

Conclusions amp Future Work

Conclusions

Lesson learnt Collisions are hard to tame in security and privacy

Bloom filters

bull Developers tend to ignore the worst-case of algorithms

bull Data structures with ad-hoc crypto primitives are at the best risky

bull Need of secure instantiations eg as in Count-Min sketches

Safe Browsing

bull Re-establish the weakness of anonymity-set privacy model

bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so

38

Future work

Bloom filters

bull On Bloom filters Bloom paradox [Rottenstreich 2015]

bull Beyond Bloom filters Security of Bloom filter variants

Safe Browsing

bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]

bull Privacy Can local cache improve Private Information Retrieval

39

Thank you

40

Other works not covered today

bull Performance of cryptographic accumulators

bull Private password auditing

bull (In)Security of Google and Yandex Safe Browsing

bull Alerting websites Risks and solutions

bull Decompression quines and anti-viruses

bull Pitfalls of hashing for privacy

bull Linkable (zero-knowledge) proofs for private and accountable gossip

41

Chosen-insertion adversary

Capabilities Can choose items to insert in the filterAssumption State of the filter is knownGoal Increase the false positive probability

Strategy Greedily insert x that maximizes bits set to 1

bull Each x sets k bits to 1

0 0 0 1 0 1 1 1 1 1 1 1

x1 x2 x3 x4

Impact

No attack Under attackbits set to 1 072nkopt nkopt

false positive rate (f ) 12kopt

(nkoptm

)kopt

10

Impact on a sample filter

Parameters m = 3200 n = 600 kopt = 4 fopt = 0077

0 100 200 300 400 500 600Number of inserted items

False positive probability

0

007

014

021

028

035

f advPartialf

fopt

insert last200 items

11

Applying adversary models

Factors enabling our attacks

bull Insecure hash functions

bull Digest truncation

bull High Bloom filter false positive rate

Vulnerable software applications

Software app Hashing Parameter info

Scrapy Web crawler NA NADablooms Spam filter MurmurHash n = 100000 f = 0057Squid Web proxy MD5 f = 009 k = 4AIEngine NIDS C++ hash ` = 13 n = 5000 f = 045NSRL Forensic tool SHA-1 ` = 32 n asymp 14times 106 f = 808times 10minus10

sdhash Forensic tool SHA-1 ` = 11 n = 128 f = 00014

12

Bypassing a forensic tool

NSRL forensic tool

bull A whitelist of ldquoknown safe filesrdquo

bull Stored and distributed as a Bloom filter

bull Maintained by NIST

A query-only attack Goal is to hide a contraband file

bull Adversary modifies the file to create a false positive

bull Modification should be easily reversible

bull The filter detects the file as safe

13

Countermeasure against chosen-insertion attacks

Use worst-case parameters for Bloom filters

bull Fix m n and choose k that minimizes false positive probability

f adv =

(nk

m

)k

Optimal values are

kadvopt =

m

enand f adv

opt = eminusmen

bull Impact On a sample Bloom filter with m = 3200 n = 600bull Average case kopt = 4 fopt = 0077bull Worst case kadv

opt = 2 f advopt = 01

14

Summary of other attacks amp defenses

Attacks

Software app Attacks

Scrapy Web crawler chosen-insertion query-onlyDablooms Spam filter chosen-insertion deletionSquid Web proxy chosen-insertion query-onlyAIEngine NIDS query-onlysdhash Forensic tool query-only

Defenses

bull Use HMAC

bull Use an alternate data structure Bloom hash tables [Bloom 1970]bull Resists better to chosen-insertion attacksbull Is often more memory efficient than Bloom filtersbull On average O(k) hash computations for items not in the tablebull On average O(ln k) for items in the table

15

Related work

bull Algorithmic complexity attacks [Crosby et al 2003]bull DoS attacks against hash tablesbull Force hash tables to operate in O(n) instead of O(1)bull Similar attacks against skip-lists regular expressions etc

bull Independent work on Bloom filters [Naor et al 2015]bull Provide a theoretical frameworkbull Study a query-only adversary Can only adaptively query the filter

16

Privacy Safe Browsing

Google Safe Browsing in Mozilla Firefox

18

And many others

19

Adverted privacy policy

ldquoWe collect visited web pages clickstream data or web addressaccessed browser identifier and user IDrdquo mdash WOT

ldquocollects information including IP address the origin of thesearch and may share this info with a third partyrdquo mdash Norton

Many Safe Browsing services are privacy unfriendly by design

ldquocannot determine the real URL from the informationreceivedrdquo mdash Google

bull Google seems to provide the most private service

bull Hence focus of this work

20

Google Safe Browsing When Why and How

bull When In 2008 by Google

bull Goals Protect frombull Phishing sitesbull Malware sites

bull How Easy-to-use APIs in C Python and PHP

bull Methodology Blacklists

bull Available in

bull Impactbull Billions of usersbull Detects thousands of new malicious websites per day

bull Cloned by Yandex as Yandex Safe Browsing 21

Lookup API

bull Google harvests phishing and malware URLs to feed a blacklist

bull Client checks the status using a simple HTTP GETPOST requestsb-sslgooglecomsafebrowsingapilookupexamplecom

Issues

bull Does not scale Heavy network traffic

bull Privacy URLs are sent in clear

22

Improving privacy using a local cache

Database

Server ClientLocal cache

(3) Conditionalquery

(4) Answer

(1) Query

(2) Answer

Extended client

bull Communication with the server is reduced

bull Better privacy

23

Google Safe Browsing API (v3) Local cache

bull BlacklistsList name Description Entries

goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1

bull Does not handle URLs directly instead their SHA-256 digests

wwwevilcom SHA-256 cc7af8a31918

cc7af8a31918

32-bit prefix

bull Local cache contains prefixes 24

Clientrsquos behavior chart

Userrsquos input

URLCanonicalize andcompute digest

Foundprefix

Get fulldigests

Founddigest

MaliciousURL

Non-maliciousURL

yes

no

yes

no

25

Canonicalization and decompositions

bull Input URLhttpusrpwdabcport12extparam=1frags

bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password

bull Multiple decompositions are checked for a single URL

Decompositions of canonicalized URL

1 abc12extparam=12 abc12ext3 abc14 abc

5 bc12extparam=16 bc12ext7 bc18 bc

bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious

26

Purpose of computing decompositions

Memory saving

bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix

A more intricate example

Unsafe link Safe link

dbcebc

bcabc2

abc1abc

bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc

27

Privacy of Google Safe Browsing

ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy

Our goal A privacy analysis of Google and Yandex Safe Browsing

URL Prefix

wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3

bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set

28

Tracking Safe Browsing users

Our assumptions

bull Google and Yandex have incentives to behave maliciously

bull Wish to learn whether a user visits some selected URLs

How Google can track Safe Browsing users

bull Builds a list of prefixes to track

bull Includes these prefixes in the clientrsquos local cache

bull Learns from the requests whether a user visited a specific URL

bull Key parameter Anonymity-set size

29

Estimating anonymity-set size

bull Anonymity-set size of a prefix URLs that yield the prefix

Year URLs Domains

2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million

bull Estimate anonymity-set size Apply balls-into-bins model

Avg for URLs Avg for Domains

Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast

0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large

30

Sending multiple prefixes

Example with two prefixes

Decomposition Prefix

petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

Intuitively

bull Prefix for petsymposiumorg is not enough for re-identification

bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix

bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification

31

Ambiguity on two prefixes

bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache

URL Decomposition Prefix

Target URL abcabc A

bc B

Ambiguity

Type I gabcgabc C

abc A

bc B

Type II gbcgbc A larr collision on 32 bits

bc B

Type III defdef A larr collision on 32 bits

ef B larr collision on 32 bits

bull P[Type III] = 1264

bull Type II URLs exist only when decomp on the domain gt 232

bull P[Type I] gt P[Type II] gt P[Type III]

bull Mainly only Type I URLs create ambiguity in re-identification 32

How to track a given URL A real-world example (I)

petsymposiumorg2015

petsymposiumorg

petsymposiumorg2016linksphp

petsymposiumorg2016cfpphp

petsymposiumorg2016faqsphp

Type I URLs for target URL

petsymposiumorg2016

Goal Identify users interested in PETs

bull Target URL is petsymposiumorg2016

33

How to track a given URL A real-world example (II)

bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp

Decomposition Prefix

petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

bull Including 2 prefixes in the local cache rArr Anonymity set size of 4

bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes

bull Server receives 2 prefixes rArr visited page is the target URL

bull Server receives 3 prefixes rArr visited page is either of the leaf URLs

bull The third prefix decides which leaf URL was visited

bull Generalizable to any number of prefixes 34

Examples of URLs creating multiple hits

bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing

URL matching decomposition

Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf

17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp

httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues

1001cartesorgtag

Yandex

httpfrxhamstercomuservideofrxhamstercomxhamstercom

httpnlxhamstercomuservideonlxhamstercomxhamstercom

httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom

httpmmofoscomuserloginmmofoscommofoscom

httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom

teenslovehugecockscom

bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom

bull No need to add additional prefix for French or Dutch version35

Responsible disclosure and impact

bull Disclosure to Mozilla Firefox

ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox

bull Disclosure to Yandex

ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team

bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)

ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy

36

Conclusions amp Future Work

Conclusions

Lesson learnt Collisions are hard to tame in security and privacy

Bloom filters

bull Developers tend to ignore the worst-case of algorithms

bull Data structures with ad-hoc crypto primitives are at the best risky

bull Need of secure instantiations eg as in Count-Min sketches

Safe Browsing

bull Re-establish the weakness of anonymity-set privacy model

bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so

38

Future work

Bloom filters

bull On Bloom filters Bloom paradox [Rottenstreich 2015]

bull Beyond Bloom filters Security of Bloom filter variants

Safe Browsing

bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]

bull Privacy Can local cache improve Private Information Retrieval

39

Thank you

40

Other works not covered today

bull Performance of cryptographic accumulators

bull Private password auditing

bull (In)Security of Google and Yandex Safe Browsing

bull Alerting websites Risks and solutions

bull Decompression quines and anti-viruses

bull Pitfalls of hashing for privacy

bull Linkable (zero-knowledge) proofs for private and accountable gossip

41

Impact on a sample filter

Parameters m = 3200 n = 600 kopt = 4 fopt = 0077

0 100 200 300 400 500 600Number of inserted items

False positive probability

0

007

014

021

028

035

f advPartialf

fopt

insert last200 items

11

Applying adversary models

Factors enabling our attacks

bull Insecure hash functions

bull Digest truncation

bull High Bloom filter false positive rate

Vulnerable software applications

Software app Hashing Parameter info

Scrapy Web crawler NA NADablooms Spam filter MurmurHash n = 100000 f = 0057Squid Web proxy MD5 f = 009 k = 4AIEngine NIDS C++ hash ` = 13 n = 5000 f = 045NSRL Forensic tool SHA-1 ` = 32 n asymp 14times 106 f = 808times 10minus10

sdhash Forensic tool SHA-1 ` = 11 n = 128 f = 00014

12

Bypassing a forensic tool

NSRL forensic tool

bull A whitelist of ldquoknown safe filesrdquo

bull Stored and distributed as a Bloom filter

bull Maintained by NIST

A query-only attack Goal is to hide a contraband file

bull Adversary modifies the file to create a false positive

bull Modification should be easily reversible

bull The filter detects the file as safe

13

Countermeasure against chosen-insertion attacks

Use worst-case parameters for Bloom filters

bull Fix m n and choose k that minimizes false positive probability

f adv =

(nk

m

)k

Optimal values are

kadvopt =

m

enand f adv

opt = eminusmen

bull Impact On a sample Bloom filter with m = 3200 n = 600bull Average case kopt = 4 fopt = 0077bull Worst case kadv

opt = 2 f advopt = 01

14

Summary of other attacks amp defenses

Attacks

Software app Attacks

Scrapy Web crawler chosen-insertion query-onlyDablooms Spam filter chosen-insertion deletionSquid Web proxy chosen-insertion query-onlyAIEngine NIDS query-onlysdhash Forensic tool query-only

Defenses

bull Use HMAC

bull Use an alternate data structure Bloom hash tables [Bloom 1970]bull Resists better to chosen-insertion attacksbull Is often more memory efficient than Bloom filtersbull On average O(k) hash computations for items not in the tablebull On average O(ln k) for items in the table

15

Related work

bull Algorithmic complexity attacks [Crosby et al 2003]bull DoS attacks against hash tablesbull Force hash tables to operate in O(n) instead of O(1)bull Similar attacks against skip-lists regular expressions etc

bull Independent work on Bloom filters [Naor et al 2015]bull Provide a theoretical frameworkbull Study a query-only adversary Can only adaptively query the filter

16

Privacy Safe Browsing

Google Safe Browsing in Mozilla Firefox

18

And many others

19

Adverted privacy policy

ldquoWe collect visited web pages clickstream data or web addressaccessed browser identifier and user IDrdquo mdash WOT

ldquocollects information including IP address the origin of thesearch and may share this info with a third partyrdquo mdash Norton

Many Safe Browsing services are privacy unfriendly by design

ldquocannot determine the real URL from the informationreceivedrdquo mdash Google

bull Google seems to provide the most private service

bull Hence focus of this work

20

Google Safe Browsing When Why and How

bull When In 2008 by Google

bull Goals Protect frombull Phishing sitesbull Malware sites

bull How Easy-to-use APIs in C Python and PHP

bull Methodology Blacklists

bull Available in

bull Impactbull Billions of usersbull Detects thousands of new malicious websites per day

bull Cloned by Yandex as Yandex Safe Browsing 21

Lookup API

bull Google harvests phishing and malware URLs to feed a blacklist

bull Client checks the status using a simple HTTP GETPOST requestsb-sslgooglecomsafebrowsingapilookupexamplecom

Issues

bull Does not scale Heavy network traffic

bull Privacy URLs are sent in clear

22

Improving privacy using a local cache

Database

Server ClientLocal cache

(3) Conditionalquery

(4) Answer

(1) Query

(2) Answer

Extended client

bull Communication with the server is reduced

bull Better privacy

23

Google Safe Browsing API (v3) Local cache

bull BlacklistsList name Description Entries

goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1

bull Does not handle URLs directly instead their SHA-256 digests

wwwevilcom SHA-256 cc7af8a31918

cc7af8a31918

32-bit prefix

bull Local cache contains prefixes 24

Clientrsquos behavior chart

Userrsquos input

URLCanonicalize andcompute digest

Foundprefix

Get fulldigests

Founddigest

MaliciousURL

Non-maliciousURL

yes

no

yes

no

25

Canonicalization and decompositions

bull Input URLhttpusrpwdabcport12extparam=1frags

bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password

bull Multiple decompositions are checked for a single URL

Decompositions of canonicalized URL

1 abc12extparam=12 abc12ext3 abc14 abc

5 bc12extparam=16 bc12ext7 bc18 bc

bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious

26

Purpose of computing decompositions

Memory saving

bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix

A more intricate example

Unsafe link Safe link

dbcebc

bcabc2

abc1abc

bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc

27

Privacy of Google Safe Browsing

ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy

Our goal A privacy analysis of Google and Yandex Safe Browsing

URL Prefix

wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3

bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set

28

Tracking Safe Browsing users

Our assumptions

bull Google and Yandex have incentives to behave maliciously

bull Wish to learn whether a user visits some selected URLs

How Google can track Safe Browsing users

bull Builds a list of prefixes to track

bull Includes these prefixes in the clientrsquos local cache

bull Learns from the requests whether a user visited a specific URL

bull Key parameter Anonymity-set size

29

Estimating anonymity-set size

bull Anonymity-set size of a prefix URLs that yield the prefix

Year URLs Domains

2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million

bull Estimate anonymity-set size Apply balls-into-bins model

Avg for URLs Avg for Domains

Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast

0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large

30

Sending multiple prefixes

Example with two prefixes

Decomposition Prefix

petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

Intuitively

bull Prefix for petsymposiumorg is not enough for re-identification

bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix

bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification

31

Ambiguity on two prefixes

bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache

URL Decomposition Prefix

Target URL abcabc A

bc B

Ambiguity

Type I gabcgabc C

abc A

bc B

Type II gbcgbc A larr collision on 32 bits

bc B

Type III defdef A larr collision on 32 bits

ef B larr collision on 32 bits

bull P[Type III] = 1264

bull Type II URLs exist only when decomp on the domain gt 232

bull P[Type I] gt P[Type II] gt P[Type III]

bull Mainly only Type I URLs create ambiguity in re-identification 32

How to track a given URL A real-world example (I)

petsymposiumorg2015

petsymposiumorg

petsymposiumorg2016linksphp

petsymposiumorg2016cfpphp

petsymposiumorg2016faqsphp

Type I URLs for target URL

petsymposiumorg2016

Goal Identify users interested in PETs

bull Target URL is petsymposiumorg2016

33

How to track a given URL A real-world example (II)

bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp

Decomposition Prefix

petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

bull Including 2 prefixes in the local cache rArr Anonymity set size of 4

bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes

bull Server receives 2 prefixes rArr visited page is the target URL

bull Server receives 3 prefixes rArr visited page is either of the leaf URLs

bull The third prefix decides which leaf URL was visited

bull Generalizable to any number of prefixes 34

Examples of URLs creating multiple hits

bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing

URL matching decomposition

Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf

17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp

httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues

1001cartesorgtag

Yandex

httpfrxhamstercomuservideofrxhamstercomxhamstercom

httpnlxhamstercomuservideonlxhamstercomxhamstercom

httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom

httpmmofoscomuserloginmmofoscommofoscom

httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom

teenslovehugecockscom

bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom

bull No need to add additional prefix for French or Dutch version35

Responsible disclosure and impact

bull Disclosure to Mozilla Firefox

ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox

bull Disclosure to Yandex

ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team

bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)

ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy

36

Conclusions amp Future Work

Conclusions

Lesson learnt Collisions are hard to tame in security and privacy

Bloom filters

bull Developers tend to ignore the worst-case of algorithms

bull Data structures with ad-hoc crypto primitives are at the best risky

bull Need of secure instantiations eg as in Count-Min sketches

Safe Browsing

bull Re-establish the weakness of anonymity-set privacy model

bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so

38

Future work

Bloom filters

bull On Bloom filters Bloom paradox [Rottenstreich 2015]

bull Beyond Bloom filters Security of Bloom filter variants

Safe Browsing

bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]

bull Privacy Can local cache improve Private Information Retrieval

39

Thank you

40

Other works not covered today

bull Performance of cryptographic accumulators

bull Private password auditing

bull (In)Security of Google and Yandex Safe Browsing

bull Alerting websites Risks and solutions

bull Decompression quines and anti-viruses

bull Pitfalls of hashing for privacy

bull Linkable (zero-knowledge) proofs for private and accountable gossip

41

Applying adversary models

Factors enabling our attacks

bull Insecure hash functions

bull Digest truncation

bull High Bloom filter false positive rate

Vulnerable software applications

Software app Hashing Parameter info

Scrapy Web crawler NA NADablooms Spam filter MurmurHash n = 100000 f = 0057Squid Web proxy MD5 f = 009 k = 4AIEngine NIDS C++ hash ` = 13 n = 5000 f = 045NSRL Forensic tool SHA-1 ` = 32 n asymp 14times 106 f = 808times 10minus10

sdhash Forensic tool SHA-1 ` = 11 n = 128 f = 00014

12

Bypassing a forensic tool

NSRL forensic tool

bull A whitelist of ldquoknown safe filesrdquo

bull Stored and distributed as a Bloom filter

bull Maintained by NIST

A query-only attack Goal is to hide a contraband file

bull Adversary modifies the file to create a false positive

bull Modification should be easily reversible

bull The filter detects the file as safe

13

Countermeasure against chosen-insertion attacks

Use worst-case parameters for Bloom filters

bull Fix m n and choose k that minimizes false positive probability

f adv =

(nk

m

)k

Optimal values are

kadvopt =

m

enand f adv

opt = eminusmen

bull Impact On a sample Bloom filter with m = 3200 n = 600bull Average case kopt = 4 fopt = 0077bull Worst case kadv

opt = 2 f advopt = 01

14

Summary of other attacks amp defenses

Attacks

Software app Attacks

Scrapy Web crawler chosen-insertion query-onlyDablooms Spam filter chosen-insertion deletionSquid Web proxy chosen-insertion query-onlyAIEngine NIDS query-onlysdhash Forensic tool query-only

Defenses

bull Use HMAC

bull Use an alternate data structure Bloom hash tables [Bloom 1970]bull Resists better to chosen-insertion attacksbull Is often more memory efficient than Bloom filtersbull On average O(k) hash computations for items not in the tablebull On average O(ln k) for items in the table

15

Related work

bull Algorithmic complexity attacks [Crosby et al 2003]bull DoS attacks against hash tablesbull Force hash tables to operate in O(n) instead of O(1)bull Similar attacks against skip-lists regular expressions etc

bull Independent work on Bloom filters [Naor et al 2015]bull Provide a theoretical frameworkbull Study a query-only adversary Can only adaptively query the filter

16

Privacy Safe Browsing

Google Safe Browsing in Mozilla Firefox

18

And many others

19

Adverted privacy policy

ldquoWe collect visited web pages clickstream data or web addressaccessed browser identifier and user IDrdquo mdash WOT

ldquocollects information including IP address the origin of thesearch and may share this info with a third partyrdquo mdash Norton

Many Safe Browsing services are privacy unfriendly by design

ldquocannot determine the real URL from the informationreceivedrdquo mdash Google

bull Google seems to provide the most private service

bull Hence focus of this work

20

Google Safe Browsing When Why and How

bull When In 2008 by Google

bull Goals Protect frombull Phishing sitesbull Malware sites

bull How Easy-to-use APIs in C Python and PHP

bull Methodology Blacklists

bull Available in

bull Impactbull Billions of usersbull Detects thousands of new malicious websites per day

bull Cloned by Yandex as Yandex Safe Browsing 21

Lookup API

bull Google harvests phishing and malware URLs to feed a blacklist

bull Client checks the status using a simple HTTP GETPOST requestsb-sslgooglecomsafebrowsingapilookupexamplecom

Issues

bull Does not scale Heavy network traffic

bull Privacy URLs are sent in clear

22

Improving privacy using a local cache

Database

Server ClientLocal cache

(3) Conditionalquery

(4) Answer

(1) Query

(2) Answer

Extended client

bull Communication with the server is reduced

bull Better privacy

23

Google Safe Browsing API (v3) Local cache

bull BlacklistsList name Description Entries

goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1

bull Does not handle URLs directly instead their SHA-256 digests

wwwevilcom SHA-256 cc7af8a31918

cc7af8a31918

32-bit prefix

bull Local cache contains prefixes 24

Clientrsquos behavior chart

Userrsquos input

URLCanonicalize andcompute digest

Foundprefix

Get fulldigests

Founddigest

MaliciousURL

Non-maliciousURL

yes

no

yes

no

25

Canonicalization and decompositions

bull Input URLhttpusrpwdabcport12extparam=1frags

bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password

bull Multiple decompositions are checked for a single URL

Decompositions of canonicalized URL

1 abc12extparam=12 abc12ext3 abc14 abc

5 bc12extparam=16 bc12ext7 bc18 bc

bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious

26

Purpose of computing decompositions

Memory saving

bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix

A more intricate example

Unsafe link Safe link

dbcebc

bcabc2

abc1abc

bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc

27

Privacy of Google Safe Browsing

ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy

Our goal A privacy analysis of Google and Yandex Safe Browsing

URL Prefix

wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3

bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set

28

Tracking Safe Browsing users

Our assumptions

bull Google and Yandex have incentives to behave maliciously

bull Wish to learn whether a user visits some selected URLs

How Google can track Safe Browsing users

bull Builds a list of prefixes to track

bull Includes these prefixes in the clientrsquos local cache

bull Learns from the requests whether a user visited a specific URL

bull Key parameter Anonymity-set size

29

Estimating anonymity-set size

bull Anonymity-set size of a prefix URLs that yield the prefix

Year URLs Domains

2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million

bull Estimate anonymity-set size Apply balls-into-bins model

Avg for URLs Avg for Domains

Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast

0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large

30

Sending multiple prefixes

Example with two prefixes

Decomposition Prefix

petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

Intuitively

bull Prefix for petsymposiumorg is not enough for re-identification

bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix

bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification

31

Ambiguity on two prefixes

bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache

URL Decomposition Prefix

Target URL abcabc A

bc B

Ambiguity

Type I gabcgabc C

abc A

bc B

Type II gbcgbc A larr collision on 32 bits

bc B

Type III defdef A larr collision on 32 bits

ef B larr collision on 32 bits

bull P[Type III] = 1264

bull Type II URLs exist only when decomp on the domain gt 232

bull P[Type I] gt P[Type II] gt P[Type III]

bull Mainly only Type I URLs create ambiguity in re-identification 32

How to track a given URL A real-world example (I)

petsymposiumorg2015

petsymposiumorg

petsymposiumorg2016linksphp

petsymposiumorg2016cfpphp

petsymposiumorg2016faqsphp

Type I URLs for target URL

petsymposiumorg2016

Goal Identify users interested in PETs

bull Target URL is petsymposiumorg2016

33

How to track a given URL A real-world example (II)

bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp

Decomposition Prefix

petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

bull Including 2 prefixes in the local cache rArr Anonymity set size of 4

bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes

bull Server receives 2 prefixes rArr visited page is the target URL

bull Server receives 3 prefixes rArr visited page is either of the leaf URLs

bull The third prefix decides which leaf URL was visited

bull Generalizable to any number of prefixes 34

Examples of URLs creating multiple hits

bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing

URL matching decomposition

Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf

17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp

httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues

1001cartesorgtag

Yandex

httpfrxhamstercomuservideofrxhamstercomxhamstercom

httpnlxhamstercomuservideonlxhamstercomxhamstercom

httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom

httpmmofoscomuserloginmmofoscommofoscom

httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom

teenslovehugecockscom

bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom

bull No need to add additional prefix for French or Dutch version35

Responsible disclosure and impact

bull Disclosure to Mozilla Firefox

ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox

bull Disclosure to Yandex

ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team

bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)

ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy

36

Conclusions amp Future Work

Conclusions

Lesson learnt Collisions are hard to tame in security and privacy

Bloom filters

bull Developers tend to ignore the worst-case of algorithms

bull Data structures with ad-hoc crypto primitives are at the best risky

bull Need of secure instantiations eg as in Count-Min sketches

Safe Browsing

bull Re-establish the weakness of anonymity-set privacy model

bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so

38

Future work

Bloom filters

bull On Bloom filters Bloom paradox [Rottenstreich 2015]

bull Beyond Bloom filters Security of Bloom filter variants

Safe Browsing

bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]

bull Privacy Can local cache improve Private Information Retrieval

39

Thank you

40

Other works not covered today

bull Performance of cryptographic accumulators

bull Private password auditing

bull (In)Security of Google and Yandex Safe Browsing

bull Alerting websites Risks and solutions

bull Decompression quines and anti-viruses

bull Pitfalls of hashing for privacy

bull Linkable (zero-knowledge) proofs for private and accountable gossip

41

Bypassing a forensic tool

NSRL forensic tool

bull A whitelist of ldquoknown safe filesrdquo

bull Stored and distributed as a Bloom filter

bull Maintained by NIST

A query-only attack Goal is to hide a contraband file

bull Adversary modifies the file to create a false positive

bull Modification should be easily reversible

bull The filter detects the file as safe

13

Countermeasure against chosen-insertion attacks

Use worst-case parameters for Bloom filters

bull Fix m n and choose k that minimizes false positive probability

f adv =

(nk

m

)k

Optimal values are

kadvopt =

m

enand f adv

opt = eminusmen

bull Impact On a sample Bloom filter with m = 3200 n = 600bull Average case kopt = 4 fopt = 0077bull Worst case kadv

opt = 2 f advopt = 01

14

Summary of other attacks amp defenses

Attacks

Software app Attacks

Scrapy Web crawler chosen-insertion query-onlyDablooms Spam filter chosen-insertion deletionSquid Web proxy chosen-insertion query-onlyAIEngine NIDS query-onlysdhash Forensic tool query-only

Defenses

bull Use HMAC

bull Use an alternate data structure Bloom hash tables [Bloom 1970]bull Resists better to chosen-insertion attacksbull Is often more memory efficient than Bloom filtersbull On average O(k) hash computations for items not in the tablebull On average O(ln k) for items in the table

15

Related work

bull Algorithmic complexity attacks [Crosby et al 2003]bull DoS attacks against hash tablesbull Force hash tables to operate in O(n) instead of O(1)bull Similar attacks against skip-lists regular expressions etc

bull Independent work on Bloom filters [Naor et al 2015]bull Provide a theoretical frameworkbull Study a query-only adversary Can only adaptively query the filter

16

Privacy Safe Browsing

Google Safe Browsing in Mozilla Firefox

18

And many others

19

Adverted privacy policy

ldquoWe collect visited web pages clickstream data or web addressaccessed browser identifier and user IDrdquo mdash WOT

ldquocollects information including IP address the origin of thesearch and may share this info with a third partyrdquo mdash Norton

Many Safe Browsing services are privacy unfriendly by design

ldquocannot determine the real URL from the informationreceivedrdquo mdash Google

bull Google seems to provide the most private service

bull Hence focus of this work

20

Google Safe Browsing When Why and How

bull When In 2008 by Google

bull Goals Protect frombull Phishing sitesbull Malware sites

bull How Easy-to-use APIs in C Python and PHP

bull Methodology Blacklists

bull Available in

bull Impactbull Billions of usersbull Detects thousands of new malicious websites per day

bull Cloned by Yandex as Yandex Safe Browsing 21

Lookup API

bull Google harvests phishing and malware URLs to feed a blacklist

bull Client checks the status using a simple HTTP GETPOST requestsb-sslgooglecomsafebrowsingapilookupexamplecom

Issues

bull Does not scale Heavy network traffic

bull Privacy URLs are sent in clear

22

Improving privacy using a local cache

Database

Server ClientLocal cache

(3) Conditionalquery

(4) Answer

(1) Query

(2) Answer

Extended client

bull Communication with the server is reduced

bull Better privacy

23

Google Safe Browsing API (v3) Local cache

bull BlacklistsList name Description Entries

goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1

bull Does not handle URLs directly instead their SHA-256 digests

wwwevilcom SHA-256 cc7af8a31918

cc7af8a31918

32-bit prefix

bull Local cache contains prefixes 24

Clientrsquos behavior chart

Userrsquos input

URLCanonicalize andcompute digest

Foundprefix

Get fulldigests

Founddigest

MaliciousURL

Non-maliciousURL

yes

no

yes

no

25

Canonicalization and decompositions

bull Input URLhttpusrpwdabcport12extparam=1frags

bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password

bull Multiple decompositions are checked for a single URL

Decompositions of canonicalized URL

1 abc12extparam=12 abc12ext3 abc14 abc

5 bc12extparam=16 bc12ext7 bc18 bc

bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious

26

Purpose of computing decompositions

Memory saving

bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix

A more intricate example

Unsafe link Safe link

dbcebc

bcabc2

abc1abc

bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc

27

Privacy of Google Safe Browsing

ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy

Our goal A privacy analysis of Google and Yandex Safe Browsing

URL Prefix

wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3

bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set

28

Tracking Safe Browsing users

Our assumptions

bull Google and Yandex have incentives to behave maliciously

bull Wish to learn whether a user visits some selected URLs

How Google can track Safe Browsing users

bull Builds a list of prefixes to track

bull Includes these prefixes in the clientrsquos local cache

bull Learns from the requests whether a user visited a specific URL

bull Key parameter Anonymity-set size

29

Estimating anonymity-set size

bull Anonymity-set size of a prefix URLs that yield the prefix

Year URLs Domains

2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million

bull Estimate anonymity-set size Apply balls-into-bins model

Avg for URLs Avg for Domains

Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast

0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large

30

Sending multiple prefixes

Example with two prefixes

Decomposition Prefix

petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

Intuitively

bull Prefix for petsymposiumorg is not enough for re-identification

bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix

bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification

31

Ambiguity on two prefixes

bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache

URL Decomposition Prefix

Target URL abcabc A

bc B

Ambiguity

Type I gabcgabc C

abc A

bc B

Type II gbcgbc A larr collision on 32 bits

bc B

Type III defdef A larr collision on 32 bits

ef B larr collision on 32 bits

bull P[Type III] = 1264

bull Type II URLs exist only when decomp on the domain gt 232

bull P[Type I] gt P[Type II] gt P[Type III]

bull Mainly only Type I URLs create ambiguity in re-identification 32

How to track a given URL A real-world example (I)

petsymposiumorg2015

petsymposiumorg

petsymposiumorg2016linksphp

petsymposiumorg2016cfpphp

petsymposiumorg2016faqsphp

Type I URLs for target URL

petsymposiumorg2016

Goal Identify users interested in PETs

bull Target URL is petsymposiumorg2016

33

How to track a given URL A real-world example (II)

bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp

Decomposition Prefix

petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

bull Including 2 prefixes in the local cache rArr Anonymity set size of 4

bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes

bull Server receives 2 prefixes rArr visited page is the target URL

bull Server receives 3 prefixes rArr visited page is either of the leaf URLs

bull The third prefix decides which leaf URL was visited

bull Generalizable to any number of prefixes 34

Examples of URLs creating multiple hits

bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing

URL matching decomposition

Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf

17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp

httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues

1001cartesorgtag

Yandex

httpfrxhamstercomuservideofrxhamstercomxhamstercom

httpnlxhamstercomuservideonlxhamstercomxhamstercom

httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom

httpmmofoscomuserloginmmofoscommofoscom

httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom

teenslovehugecockscom

bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom

bull No need to add additional prefix for French or Dutch version35

Responsible disclosure and impact

bull Disclosure to Mozilla Firefox

ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox

bull Disclosure to Yandex

ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team

bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)

ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy

36

Conclusions amp Future Work

Conclusions

Lesson learnt Collisions are hard to tame in security and privacy

Bloom filters

bull Developers tend to ignore the worst-case of algorithms

bull Data structures with ad-hoc crypto primitives are at the best risky

bull Need of secure instantiations eg as in Count-Min sketches

Safe Browsing

bull Re-establish the weakness of anonymity-set privacy model

bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so

38

Future work

Bloom filters

bull On Bloom filters Bloom paradox [Rottenstreich 2015]

bull Beyond Bloom filters Security of Bloom filter variants

Safe Browsing

bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]

bull Privacy Can local cache improve Private Information Retrieval

39

Thank you

40

Other works not covered today

bull Performance of cryptographic accumulators

bull Private password auditing

bull (In)Security of Google and Yandex Safe Browsing

bull Alerting websites Risks and solutions

bull Decompression quines and anti-viruses

bull Pitfalls of hashing for privacy

bull Linkable (zero-knowledge) proofs for private and accountable gossip

41

Countermeasure against chosen-insertion attacks

Use worst-case parameters for Bloom filters

bull Fix m n and choose k that minimizes false positive probability

f adv =

(nk

m

)k

Optimal values are

kadvopt =

m

enand f adv

opt = eminusmen

bull Impact On a sample Bloom filter with m = 3200 n = 600bull Average case kopt = 4 fopt = 0077bull Worst case kadv

opt = 2 f advopt = 01

14

Summary of other attacks amp defenses

Attacks

Software app Attacks

Scrapy Web crawler chosen-insertion query-onlyDablooms Spam filter chosen-insertion deletionSquid Web proxy chosen-insertion query-onlyAIEngine NIDS query-onlysdhash Forensic tool query-only

Defenses

bull Use HMAC

bull Use an alternate data structure Bloom hash tables [Bloom 1970]bull Resists better to chosen-insertion attacksbull Is often more memory efficient than Bloom filtersbull On average O(k) hash computations for items not in the tablebull On average O(ln k) for items in the table

15

Related work

bull Algorithmic complexity attacks [Crosby et al 2003]bull DoS attacks against hash tablesbull Force hash tables to operate in O(n) instead of O(1)bull Similar attacks against skip-lists regular expressions etc

bull Independent work on Bloom filters [Naor et al 2015]bull Provide a theoretical frameworkbull Study a query-only adversary Can only adaptively query the filter

16

Privacy Safe Browsing

Google Safe Browsing in Mozilla Firefox

18

And many others

19

Adverted privacy policy

ldquoWe collect visited web pages clickstream data or web addressaccessed browser identifier and user IDrdquo mdash WOT

ldquocollects information including IP address the origin of thesearch and may share this info with a third partyrdquo mdash Norton

Many Safe Browsing services are privacy unfriendly by design

ldquocannot determine the real URL from the informationreceivedrdquo mdash Google

bull Google seems to provide the most private service

bull Hence focus of this work

20

Google Safe Browsing When Why and How

bull When In 2008 by Google

bull Goals Protect frombull Phishing sitesbull Malware sites

bull How Easy-to-use APIs in C Python and PHP

bull Methodology Blacklists

bull Available in

bull Impactbull Billions of usersbull Detects thousands of new malicious websites per day

bull Cloned by Yandex as Yandex Safe Browsing 21

Lookup API

bull Google harvests phishing and malware URLs to feed a blacklist

bull Client checks the status using a simple HTTP GETPOST requestsb-sslgooglecomsafebrowsingapilookupexamplecom

Issues

bull Does not scale Heavy network traffic

bull Privacy URLs are sent in clear

22

Improving privacy using a local cache

Database

Server ClientLocal cache

(3) Conditionalquery

(4) Answer

(1) Query

(2) Answer

Extended client

bull Communication with the server is reduced

bull Better privacy

23

Google Safe Browsing API (v3) Local cache

bull BlacklistsList name Description Entries

goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1

bull Does not handle URLs directly instead their SHA-256 digests

wwwevilcom SHA-256 cc7af8a31918

cc7af8a31918

32-bit prefix

bull Local cache contains prefixes 24

Clientrsquos behavior chart

Userrsquos input

URLCanonicalize andcompute digest

Foundprefix

Get fulldigests

Founddigest

MaliciousURL

Non-maliciousURL

yes

no

yes

no

25

Canonicalization and decompositions

bull Input URLhttpusrpwdabcport12extparam=1frags

bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password

bull Multiple decompositions are checked for a single URL

Decompositions of canonicalized URL

1 abc12extparam=12 abc12ext3 abc14 abc

5 bc12extparam=16 bc12ext7 bc18 bc

bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious

26

Purpose of computing decompositions

Memory saving

bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix

A more intricate example

Unsafe link Safe link

dbcebc

bcabc2

abc1abc

bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc

27

Privacy of Google Safe Browsing

ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy

Our goal A privacy analysis of Google and Yandex Safe Browsing

URL Prefix

wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3

bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set

28

Tracking Safe Browsing users

Our assumptions

bull Google and Yandex have incentives to behave maliciously

bull Wish to learn whether a user visits some selected URLs

How Google can track Safe Browsing users

bull Builds a list of prefixes to track

bull Includes these prefixes in the clientrsquos local cache

bull Learns from the requests whether a user visited a specific URL

bull Key parameter Anonymity-set size

29

Estimating anonymity-set size

bull Anonymity-set size of a prefix URLs that yield the prefix

Year URLs Domains

2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million

bull Estimate anonymity-set size Apply balls-into-bins model

Avg for URLs Avg for Domains

Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast

0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large

30

Sending multiple prefixes

Example with two prefixes

Decomposition Prefix

petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

Intuitively

bull Prefix for petsymposiumorg is not enough for re-identification

bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix

bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification

31

Ambiguity on two prefixes

bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache

URL Decomposition Prefix

Target URL abcabc A

bc B

Ambiguity

Type I gabcgabc C

abc A

bc B

Type II gbcgbc A larr collision on 32 bits

bc B

Type III defdef A larr collision on 32 bits

ef B larr collision on 32 bits

bull P[Type III] = 1264

bull Type II URLs exist only when decomp on the domain gt 232

bull P[Type I] gt P[Type II] gt P[Type III]

bull Mainly only Type I URLs create ambiguity in re-identification 32

How to track a given URL A real-world example (I)

petsymposiumorg2015

petsymposiumorg

petsymposiumorg2016linksphp

petsymposiumorg2016cfpphp

petsymposiumorg2016faqsphp

Type I URLs for target URL

petsymposiumorg2016

Goal Identify users interested in PETs

bull Target URL is petsymposiumorg2016

33

How to track a given URL A real-world example (II)

bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp

Decomposition Prefix

petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

bull Including 2 prefixes in the local cache rArr Anonymity set size of 4

bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes

bull Server receives 2 prefixes rArr visited page is the target URL

bull Server receives 3 prefixes rArr visited page is either of the leaf URLs

bull The third prefix decides which leaf URL was visited

bull Generalizable to any number of prefixes 34

Examples of URLs creating multiple hits

bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing

URL matching decomposition

Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf

17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp

httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues

1001cartesorgtag

Yandex

httpfrxhamstercomuservideofrxhamstercomxhamstercom

httpnlxhamstercomuservideonlxhamstercomxhamstercom

httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom

httpmmofoscomuserloginmmofoscommofoscom

httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom

teenslovehugecockscom

bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom

bull No need to add additional prefix for French or Dutch version35

Responsible disclosure and impact

bull Disclosure to Mozilla Firefox

ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox

bull Disclosure to Yandex

ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team

bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)

ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy

36

Conclusions amp Future Work

Conclusions

Lesson learnt Collisions are hard to tame in security and privacy

Bloom filters

bull Developers tend to ignore the worst-case of algorithms

bull Data structures with ad-hoc crypto primitives are at the best risky

bull Need of secure instantiations eg as in Count-Min sketches

Safe Browsing

bull Re-establish the weakness of anonymity-set privacy model

bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so

38

Future work

Bloom filters

bull On Bloom filters Bloom paradox [Rottenstreich 2015]

bull Beyond Bloom filters Security of Bloom filter variants

Safe Browsing

bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]

bull Privacy Can local cache improve Private Information Retrieval

39

Thank you

40

Other works not covered today

bull Performance of cryptographic accumulators

bull Private password auditing

bull (In)Security of Google and Yandex Safe Browsing

bull Alerting websites Risks and solutions

bull Decompression quines and anti-viruses

bull Pitfalls of hashing for privacy

bull Linkable (zero-knowledge) proofs for private and accountable gossip

41

Summary of other attacks amp defenses

Attacks

Software app Attacks

Scrapy Web crawler chosen-insertion query-onlyDablooms Spam filter chosen-insertion deletionSquid Web proxy chosen-insertion query-onlyAIEngine NIDS query-onlysdhash Forensic tool query-only

Defenses

bull Use HMAC

bull Use an alternate data structure Bloom hash tables [Bloom 1970]bull Resists better to chosen-insertion attacksbull Is often more memory efficient than Bloom filtersbull On average O(k) hash computations for items not in the tablebull On average O(ln k) for items in the table

15

Related work

bull Algorithmic complexity attacks [Crosby et al 2003]bull DoS attacks against hash tablesbull Force hash tables to operate in O(n) instead of O(1)bull Similar attacks against skip-lists regular expressions etc

bull Independent work on Bloom filters [Naor et al 2015]bull Provide a theoretical frameworkbull Study a query-only adversary Can only adaptively query the filter

16

Privacy Safe Browsing

Google Safe Browsing in Mozilla Firefox

18

And many others

19

Adverted privacy policy

ldquoWe collect visited web pages clickstream data or web addressaccessed browser identifier and user IDrdquo mdash WOT

ldquocollects information including IP address the origin of thesearch and may share this info with a third partyrdquo mdash Norton

Many Safe Browsing services are privacy unfriendly by design

ldquocannot determine the real URL from the informationreceivedrdquo mdash Google

bull Google seems to provide the most private service

bull Hence focus of this work

20

Google Safe Browsing When Why and How

bull When In 2008 by Google

bull Goals Protect frombull Phishing sitesbull Malware sites

bull How Easy-to-use APIs in C Python and PHP

bull Methodology Blacklists

bull Available in

bull Impactbull Billions of usersbull Detects thousands of new malicious websites per day

bull Cloned by Yandex as Yandex Safe Browsing 21

Lookup API

bull Google harvests phishing and malware URLs to feed a blacklist

bull Client checks the status using a simple HTTP GETPOST requestsb-sslgooglecomsafebrowsingapilookupexamplecom

Issues

bull Does not scale Heavy network traffic

bull Privacy URLs are sent in clear

22

Improving privacy using a local cache

Database

Server ClientLocal cache

(3) Conditionalquery

(4) Answer

(1) Query

(2) Answer

Extended client

bull Communication with the server is reduced

bull Better privacy

23

Google Safe Browsing API (v3) Local cache

bull BlacklistsList name Description Entries

goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1

bull Does not handle URLs directly instead their SHA-256 digests

wwwevilcom SHA-256 cc7af8a31918

cc7af8a31918

32-bit prefix

bull Local cache contains prefixes 24

Clientrsquos behavior chart

Userrsquos input

URLCanonicalize andcompute digest

Foundprefix

Get fulldigests

Founddigest

MaliciousURL

Non-maliciousURL

yes

no

yes

no

25

Canonicalization and decompositions

bull Input URLhttpusrpwdabcport12extparam=1frags

bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password

bull Multiple decompositions are checked for a single URL

Decompositions of canonicalized URL

1 abc12extparam=12 abc12ext3 abc14 abc

5 bc12extparam=16 bc12ext7 bc18 bc

bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious

26

Purpose of computing decompositions

Memory saving

bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix

A more intricate example

Unsafe link Safe link

dbcebc

bcabc2

abc1abc

bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc

27

Privacy of Google Safe Browsing

ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy

Our goal A privacy analysis of Google and Yandex Safe Browsing

URL Prefix

wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3

bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set

28

Tracking Safe Browsing users

Our assumptions

bull Google and Yandex have incentives to behave maliciously

bull Wish to learn whether a user visits some selected URLs

How Google can track Safe Browsing users

bull Builds a list of prefixes to track

bull Includes these prefixes in the clientrsquos local cache

bull Learns from the requests whether a user visited a specific URL

bull Key parameter Anonymity-set size

29

Estimating anonymity-set size

bull Anonymity-set size of a prefix URLs that yield the prefix

Year URLs Domains

2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million

bull Estimate anonymity-set size Apply balls-into-bins model

Avg for URLs Avg for Domains

Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast

0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large

30

Sending multiple prefixes

Example with two prefixes

Decomposition Prefix

petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

Intuitively

bull Prefix for petsymposiumorg is not enough for re-identification

bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix

bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification

31

Ambiguity on two prefixes

bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache

URL Decomposition Prefix

Target URL abcabc A

bc B

Ambiguity

Type I gabcgabc C

abc A

bc B

Type II gbcgbc A larr collision on 32 bits

bc B

Type III defdef A larr collision on 32 bits

ef B larr collision on 32 bits

bull P[Type III] = 1264

bull Type II URLs exist only when decomp on the domain gt 232

bull P[Type I] gt P[Type II] gt P[Type III]

bull Mainly only Type I URLs create ambiguity in re-identification 32

How to track a given URL A real-world example (I)

petsymposiumorg2015

petsymposiumorg

petsymposiumorg2016linksphp

petsymposiumorg2016cfpphp

petsymposiumorg2016faqsphp

Type I URLs for target URL

petsymposiumorg2016

Goal Identify users interested in PETs

bull Target URL is petsymposiumorg2016

33

How to track a given URL A real-world example (II)

bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp

Decomposition Prefix

petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

bull Including 2 prefixes in the local cache rArr Anonymity set size of 4

bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes

bull Server receives 2 prefixes rArr visited page is the target URL

bull Server receives 3 prefixes rArr visited page is either of the leaf URLs

bull The third prefix decides which leaf URL was visited

bull Generalizable to any number of prefixes 34

Examples of URLs creating multiple hits

bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing

URL matching decomposition

Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf

17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp

httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues

1001cartesorgtag

Yandex

httpfrxhamstercomuservideofrxhamstercomxhamstercom

httpnlxhamstercomuservideonlxhamstercomxhamstercom

httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom

httpmmofoscomuserloginmmofoscommofoscom

httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom

teenslovehugecockscom

bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom

bull No need to add additional prefix for French or Dutch version35

Responsible disclosure and impact

bull Disclosure to Mozilla Firefox

ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox

bull Disclosure to Yandex

ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team

bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)

ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy

36

Conclusions amp Future Work

Conclusions

Lesson learnt Collisions are hard to tame in security and privacy

Bloom filters

bull Developers tend to ignore the worst-case of algorithms

bull Data structures with ad-hoc crypto primitives are at the best risky

bull Need of secure instantiations eg as in Count-Min sketches

Safe Browsing

bull Re-establish the weakness of anonymity-set privacy model

bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so

38

Future work

Bloom filters

bull On Bloom filters Bloom paradox [Rottenstreich 2015]

bull Beyond Bloom filters Security of Bloom filter variants

Safe Browsing

bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]

bull Privacy Can local cache improve Private Information Retrieval

39

Thank you

40

Other works not covered today

bull Performance of cryptographic accumulators

bull Private password auditing

bull (In)Security of Google and Yandex Safe Browsing

bull Alerting websites Risks and solutions

bull Decompression quines and anti-viruses

bull Pitfalls of hashing for privacy

bull Linkable (zero-knowledge) proofs for private and accountable gossip

41

Related work

bull Algorithmic complexity attacks [Crosby et al 2003]bull DoS attacks against hash tablesbull Force hash tables to operate in O(n) instead of O(1)bull Similar attacks against skip-lists regular expressions etc

bull Independent work on Bloom filters [Naor et al 2015]bull Provide a theoretical frameworkbull Study a query-only adversary Can only adaptively query the filter

16

Privacy Safe Browsing

Google Safe Browsing in Mozilla Firefox

18

And many others

19

Adverted privacy policy

ldquoWe collect visited web pages clickstream data or web addressaccessed browser identifier and user IDrdquo mdash WOT

ldquocollects information including IP address the origin of thesearch and may share this info with a third partyrdquo mdash Norton

Many Safe Browsing services are privacy unfriendly by design

ldquocannot determine the real URL from the informationreceivedrdquo mdash Google

bull Google seems to provide the most private service

bull Hence focus of this work

20

Google Safe Browsing When Why and How

bull When In 2008 by Google

bull Goals Protect frombull Phishing sitesbull Malware sites

bull How Easy-to-use APIs in C Python and PHP

bull Methodology Blacklists

bull Available in

bull Impactbull Billions of usersbull Detects thousands of new malicious websites per day

bull Cloned by Yandex as Yandex Safe Browsing 21

Lookup API

bull Google harvests phishing and malware URLs to feed a blacklist

bull Client checks the status using a simple HTTP GETPOST requestsb-sslgooglecomsafebrowsingapilookupexamplecom

Issues

bull Does not scale Heavy network traffic

bull Privacy URLs are sent in clear

22

Improving privacy using a local cache

Database

Server ClientLocal cache

(3) Conditionalquery

(4) Answer

(1) Query

(2) Answer

Extended client

bull Communication with the server is reduced

bull Better privacy

23

Google Safe Browsing API (v3) Local cache

bull BlacklistsList name Description Entries

goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1

bull Does not handle URLs directly instead their SHA-256 digests

wwwevilcom SHA-256 cc7af8a31918

cc7af8a31918

32-bit prefix

bull Local cache contains prefixes 24

Clientrsquos behavior chart

Userrsquos input

URLCanonicalize andcompute digest

Foundprefix

Get fulldigests

Founddigest

MaliciousURL

Non-maliciousURL

yes

no

yes

no

25

Canonicalization and decompositions

bull Input URLhttpusrpwdabcport12extparam=1frags

bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password

bull Multiple decompositions are checked for a single URL

Decompositions of canonicalized URL

1 abc12extparam=12 abc12ext3 abc14 abc

5 bc12extparam=16 bc12ext7 bc18 bc

bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious

26

Purpose of computing decompositions

Memory saving

bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix

A more intricate example

Unsafe link Safe link

dbcebc

bcabc2

abc1abc

bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc

27

Privacy of Google Safe Browsing

ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy

Our goal A privacy analysis of Google and Yandex Safe Browsing

URL Prefix

wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3

bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set

28

Tracking Safe Browsing users

Our assumptions

bull Google and Yandex have incentives to behave maliciously

bull Wish to learn whether a user visits some selected URLs

How Google can track Safe Browsing users

bull Builds a list of prefixes to track

bull Includes these prefixes in the clientrsquos local cache

bull Learns from the requests whether a user visited a specific URL

bull Key parameter Anonymity-set size

29

Estimating anonymity-set size

bull Anonymity-set size of a prefix URLs that yield the prefix

Year URLs Domains

2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million

bull Estimate anonymity-set size Apply balls-into-bins model

Avg for URLs Avg for Domains

Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast

0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large

30

Sending multiple prefixes

Example with two prefixes

Decomposition Prefix

petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

Intuitively

bull Prefix for petsymposiumorg is not enough for re-identification

bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix

bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification

31

Ambiguity on two prefixes

bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache

URL Decomposition Prefix

Target URL abcabc A

bc B

Ambiguity

Type I gabcgabc C

abc A

bc B

Type II gbcgbc A larr collision on 32 bits

bc B

Type III defdef A larr collision on 32 bits

ef B larr collision on 32 bits

bull P[Type III] = 1264

bull Type II URLs exist only when decomp on the domain gt 232

bull P[Type I] gt P[Type II] gt P[Type III]

bull Mainly only Type I URLs create ambiguity in re-identification 32

How to track a given URL A real-world example (I)

petsymposiumorg2015

petsymposiumorg

petsymposiumorg2016linksphp

petsymposiumorg2016cfpphp

petsymposiumorg2016faqsphp

Type I URLs for target URL

petsymposiumorg2016

Goal Identify users interested in PETs

bull Target URL is petsymposiumorg2016

33

How to track a given URL A real-world example (II)

bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp

Decomposition Prefix

petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

bull Including 2 prefixes in the local cache rArr Anonymity set size of 4

bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes

bull Server receives 2 prefixes rArr visited page is the target URL

bull Server receives 3 prefixes rArr visited page is either of the leaf URLs

bull The third prefix decides which leaf URL was visited

bull Generalizable to any number of prefixes 34

Examples of URLs creating multiple hits

bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing

URL matching decomposition

Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf

17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp

httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues

1001cartesorgtag

Yandex

httpfrxhamstercomuservideofrxhamstercomxhamstercom

httpnlxhamstercomuservideonlxhamstercomxhamstercom

httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom

httpmmofoscomuserloginmmofoscommofoscom

httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom

teenslovehugecockscom

bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom

bull No need to add additional prefix for French or Dutch version35

Responsible disclosure and impact

bull Disclosure to Mozilla Firefox

ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox

bull Disclosure to Yandex

ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team

bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)

ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy

36

Conclusions amp Future Work

Conclusions

Lesson learnt Collisions are hard to tame in security and privacy

Bloom filters

bull Developers tend to ignore the worst-case of algorithms

bull Data structures with ad-hoc crypto primitives are at the best risky

bull Need of secure instantiations eg as in Count-Min sketches

Safe Browsing

bull Re-establish the weakness of anonymity-set privacy model

bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so

38

Future work

Bloom filters

bull On Bloom filters Bloom paradox [Rottenstreich 2015]

bull Beyond Bloom filters Security of Bloom filter variants

Safe Browsing

bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]

bull Privacy Can local cache improve Private Information Retrieval

39

Thank you

40

Other works not covered today

bull Performance of cryptographic accumulators

bull Private password auditing

bull (In)Security of Google and Yandex Safe Browsing

bull Alerting websites Risks and solutions

bull Decompression quines and anti-viruses

bull Pitfalls of hashing for privacy

bull Linkable (zero-knowledge) proofs for private and accountable gossip

41

Privacy Safe Browsing

Google Safe Browsing in Mozilla Firefox

18

And many others

19

Adverted privacy policy

ldquoWe collect visited web pages clickstream data or web addressaccessed browser identifier and user IDrdquo mdash WOT

ldquocollects information including IP address the origin of thesearch and may share this info with a third partyrdquo mdash Norton

Many Safe Browsing services are privacy unfriendly by design

ldquocannot determine the real URL from the informationreceivedrdquo mdash Google

bull Google seems to provide the most private service

bull Hence focus of this work

20

Google Safe Browsing When Why and How

bull When In 2008 by Google

bull Goals Protect frombull Phishing sitesbull Malware sites

bull How Easy-to-use APIs in C Python and PHP

bull Methodology Blacklists

bull Available in

bull Impactbull Billions of usersbull Detects thousands of new malicious websites per day

bull Cloned by Yandex as Yandex Safe Browsing 21

Lookup API

bull Google harvests phishing and malware URLs to feed a blacklist

bull Client checks the status using a simple HTTP GETPOST requestsb-sslgooglecomsafebrowsingapilookupexamplecom

Issues

bull Does not scale Heavy network traffic

bull Privacy URLs are sent in clear

22

Improving privacy using a local cache

Database

Server ClientLocal cache

(3) Conditionalquery

(4) Answer

(1) Query

(2) Answer

Extended client

bull Communication with the server is reduced

bull Better privacy

23

Google Safe Browsing API (v3) Local cache

bull BlacklistsList name Description Entries

goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1

bull Does not handle URLs directly instead their SHA-256 digests

wwwevilcom SHA-256 cc7af8a31918

cc7af8a31918

32-bit prefix

bull Local cache contains prefixes 24

Clientrsquos behavior chart

Userrsquos input

URLCanonicalize andcompute digest

Foundprefix

Get fulldigests

Founddigest

MaliciousURL

Non-maliciousURL

yes

no

yes

no

25

Canonicalization and decompositions

bull Input URLhttpusrpwdabcport12extparam=1frags

bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password

bull Multiple decompositions are checked for a single URL

Decompositions of canonicalized URL

1 abc12extparam=12 abc12ext3 abc14 abc

5 bc12extparam=16 bc12ext7 bc18 bc

bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious

26

Purpose of computing decompositions

Memory saving

bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix

A more intricate example

Unsafe link Safe link

dbcebc

bcabc2

abc1abc

bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc

27

Privacy of Google Safe Browsing

ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy

Our goal A privacy analysis of Google and Yandex Safe Browsing

URL Prefix

wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3

bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set

28

Tracking Safe Browsing users

Our assumptions

bull Google and Yandex have incentives to behave maliciously

bull Wish to learn whether a user visits some selected URLs

How Google can track Safe Browsing users

bull Builds a list of prefixes to track

bull Includes these prefixes in the clientrsquos local cache

bull Learns from the requests whether a user visited a specific URL

bull Key parameter Anonymity-set size

29

Estimating anonymity-set size

bull Anonymity-set size of a prefix URLs that yield the prefix

Year URLs Domains

2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million

bull Estimate anonymity-set size Apply balls-into-bins model

Avg for URLs Avg for Domains

Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast

0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large

30

Sending multiple prefixes

Example with two prefixes

Decomposition Prefix

petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

Intuitively

bull Prefix for petsymposiumorg is not enough for re-identification

bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix

bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification

31

Ambiguity on two prefixes

bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache

URL Decomposition Prefix

Target URL abcabc A

bc B

Ambiguity

Type I gabcgabc C

abc A

bc B

Type II gbcgbc A larr collision on 32 bits

bc B

Type III defdef A larr collision on 32 bits

ef B larr collision on 32 bits

bull P[Type III] = 1264

bull Type II URLs exist only when decomp on the domain gt 232

bull P[Type I] gt P[Type II] gt P[Type III]

bull Mainly only Type I URLs create ambiguity in re-identification 32

How to track a given URL A real-world example (I)

petsymposiumorg2015

petsymposiumorg

petsymposiumorg2016linksphp

petsymposiumorg2016cfpphp

petsymposiumorg2016faqsphp

Type I URLs for target URL

petsymposiumorg2016

Goal Identify users interested in PETs

bull Target URL is petsymposiumorg2016

33

How to track a given URL A real-world example (II)

bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp

Decomposition Prefix

petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

bull Including 2 prefixes in the local cache rArr Anonymity set size of 4

bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes

bull Server receives 2 prefixes rArr visited page is the target URL

bull Server receives 3 prefixes rArr visited page is either of the leaf URLs

bull The third prefix decides which leaf URL was visited

bull Generalizable to any number of prefixes 34

Examples of URLs creating multiple hits

bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing

URL matching decomposition

Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf

17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp

httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues

1001cartesorgtag

Yandex

httpfrxhamstercomuservideofrxhamstercomxhamstercom

httpnlxhamstercomuservideonlxhamstercomxhamstercom

httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom

httpmmofoscomuserloginmmofoscommofoscom

httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom

teenslovehugecockscom

bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom

bull No need to add additional prefix for French or Dutch version35

Responsible disclosure and impact

bull Disclosure to Mozilla Firefox

ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox

bull Disclosure to Yandex

ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team

bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)

ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy

36

Conclusions amp Future Work

Conclusions

Lesson learnt Collisions are hard to tame in security and privacy

Bloom filters

bull Developers tend to ignore the worst-case of algorithms

bull Data structures with ad-hoc crypto primitives are at the best risky

bull Need of secure instantiations eg as in Count-Min sketches

Safe Browsing

bull Re-establish the weakness of anonymity-set privacy model

bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so

38

Future work

Bloom filters

bull On Bloom filters Bloom paradox [Rottenstreich 2015]

bull Beyond Bloom filters Security of Bloom filter variants

Safe Browsing

bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]

bull Privacy Can local cache improve Private Information Retrieval

39

Thank you

40

Other works not covered today

bull Performance of cryptographic accumulators

bull Private password auditing

bull (In)Security of Google and Yandex Safe Browsing

bull Alerting websites Risks and solutions

bull Decompression quines and anti-viruses

bull Pitfalls of hashing for privacy

bull Linkable (zero-knowledge) proofs for private and accountable gossip

41

Google Safe Browsing in Mozilla Firefox

18

And many others

19

Adverted privacy policy

ldquoWe collect visited web pages clickstream data or web addressaccessed browser identifier and user IDrdquo mdash WOT

ldquocollects information including IP address the origin of thesearch and may share this info with a third partyrdquo mdash Norton

Many Safe Browsing services are privacy unfriendly by design

ldquocannot determine the real URL from the informationreceivedrdquo mdash Google

bull Google seems to provide the most private service

bull Hence focus of this work

20

Google Safe Browsing When Why and How

bull When In 2008 by Google

bull Goals Protect frombull Phishing sitesbull Malware sites

bull How Easy-to-use APIs in C Python and PHP

bull Methodology Blacklists

bull Available in

bull Impactbull Billions of usersbull Detects thousands of new malicious websites per day

bull Cloned by Yandex as Yandex Safe Browsing 21

Lookup API

bull Google harvests phishing and malware URLs to feed a blacklist

bull Client checks the status using a simple HTTP GETPOST requestsb-sslgooglecomsafebrowsingapilookupexamplecom

Issues

bull Does not scale Heavy network traffic

bull Privacy URLs are sent in clear

22

Improving privacy using a local cache

Database

Server ClientLocal cache

(3) Conditionalquery

(4) Answer

(1) Query

(2) Answer

Extended client

bull Communication with the server is reduced

bull Better privacy

23

Google Safe Browsing API (v3) Local cache

bull BlacklistsList name Description Entries

goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1

bull Does not handle URLs directly instead their SHA-256 digests

wwwevilcom SHA-256 cc7af8a31918

cc7af8a31918

32-bit prefix

bull Local cache contains prefixes 24

Clientrsquos behavior chart

Userrsquos input

URLCanonicalize andcompute digest

Foundprefix

Get fulldigests

Founddigest

MaliciousURL

Non-maliciousURL

yes

no

yes

no

25

Canonicalization and decompositions

bull Input URLhttpusrpwdabcport12extparam=1frags

bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password

bull Multiple decompositions are checked for a single URL

Decompositions of canonicalized URL

1 abc12extparam=12 abc12ext3 abc14 abc

5 bc12extparam=16 bc12ext7 bc18 bc

bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious

26

Purpose of computing decompositions

Memory saving

bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix

A more intricate example

Unsafe link Safe link

dbcebc

bcabc2

abc1abc

bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc

27

Privacy of Google Safe Browsing

ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy

Our goal A privacy analysis of Google and Yandex Safe Browsing

URL Prefix

wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3

bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set

28

Tracking Safe Browsing users

Our assumptions

bull Google and Yandex have incentives to behave maliciously

bull Wish to learn whether a user visits some selected URLs

How Google can track Safe Browsing users

bull Builds a list of prefixes to track

bull Includes these prefixes in the clientrsquos local cache

bull Learns from the requests whether a user visited a specific URL

bull Key parameter Anonymity-set size

29

Estimating anonymity-set size

bull Anonymity-set size of a prefix URLs that yield the prefix

Year URLs Domains

2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million

bull Estimate anonymity-set size Apply balls-into-bins model

Avg for URLs Avg for Domains

Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast

0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large

30

Sending multiple prefixes

Example with two prefixes

Decomposition Prefix

petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

Intuitively

bull Prefix for petsymposiumorg is not enough for re-identification

bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix

bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification

31

Ambiguity on two prefixes

bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache

URL Decomposition Prefix

Target URL abcabc A

bc B

Ambiguity

Type I gabcgabc C

abc A

bc B

Type II gbcgbc A larr collision on 32 bits

bc B

Type III defdef A larr collision on 32 bits

ef B larr collision on 32 bits

bull P[Type III] = 1264

bull Type II URLs exist only when decomp on the domain gt 232

bull P[Type I] gt P[Type II] gt P[Type III]

bull Mainly only Type I URLs create ambiguity in re-identification 32

How to track a given URL A real-world example (I)

petsymposiumorg2015

petsymposiumorg

petsymposiumorg2016linksphp

petsymposiumorg2016cfpphp

petsymposiumorg2016faqsphp

Type I URLs for target URL

petsymposiumorg2016

Goal Identify users interested in PETs

bull Target URL is petsymposiumorg2016

33

How to track a given URL A real-world example (II)

bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp

Decomposition Prefix

petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

bull Including 2 prefixes in the local cache rArr Anonymity set size of 4

bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes

bull Server receives 2 prefixes rArr visited page is the target URL

bull Server receives 3 prefixes rArr visited page is either of the leaf URLs

bull The third prefix decides which leaf URL was visited

bull Generalizable to any number of prefixes 34

Examples of URLs creating multiple hits

bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing

URL matching decomposition

Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf

17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp

httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues

1001cartesorgtag

Yandex

httpfrxhamstercomuservideofrxhamstercomxhamstercom

httpnlxhamstercomuservideonlxhamstercomxhamstercom

httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom

httpmmofoscomuserloginmmofoscommofoscom

httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom

teenslovehugecockscom

bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom

bull No need to add additional prefix for French or Dutch version35

Responsible disclosure and impact

bull Disclosure to Mozilla Firefox

ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox

bull Disclosure to Yandex

ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team

bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)

ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy

36

Conclusions amp Future Work

Conclusions

Lesson learnt Collisions are hard to tame in security and privacy

Bloom filters

bull Developers tend to ignore the worst-case of algorithms

bull Data structures with ad-hoc crypto primitives are at the best risky

bull Need of secure instantiations eg as in Count-Min sketches

Safe Browsing

bull Re-establish the weakness of anonymity-set privacy model

bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so

38

Future work

Bloom filters

bull On Bloom filters Bloom paradox [Rottenstreich 2015]

bull Beyond Bloom filters Security of Bloom filter variants

Safe Browsing

bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]

bull Privacy Can local cache improve Private Information Retrieval

39

Thank you

40

Other works not covered today

bull Performance of cryptographic accumulators

bull Private password auditing

bull (In)Security of Google and Yandex Safe Browsing

bull Alerting websites Risks and solutions

bull Decompression quines and anti-viruses

bull Pitfalls of hashing for privacy

bull Linkable (zero-knowledge) proofs for private and accountable gossip

41

And many others

19

Adverted privacy policy

ldquoWe collect visited web pages clickstream data or web addressaccessed browser identifier and user IDrdquo mdash WOT

ldquocollects information including IP address the origin of thesearch and may share this info with a third partyrdquo mdash Norton

Many Safe Browsing services are privacy unfriendly by design

ldquocannot determine the real URL from the informationreceivedrdquo mdash Google

bull Google seems to provide the most private service

bull Hence focus of this work

20

Google Safe Browsing When Why and How

bull When In 2008 by Google

bull Goals Protect frombull Phishing sitesbull Malware sites

bull How Easy-to-use APIs in C Python and PHP

bull Methodology Blacklists

bull Available in

bull Impactbull Billions of usersbull Detects thousands of new malicious websites per day

bull Cloned by Yandex as Yandex Safe Browsing 21

Lookup API

bull Google harvests phishing and malware URLs to feed a blacklist

bull Client checks the status using a simple HTTP GETPOST requestsb-sslgooglecomsafebrowsingapilookupexamplecom

Issues

bull Does not scale Heavy network traffic

bull Privacy URLs are sent in clear

22

Improving privacy using a local cache

Database

Server ClientLocal cache

(3) Conditionalquery

(4) Answer

(1) Query

(2) Answer

Extended client

bull Communication with the server is reduced

bull Better privacy

23

Google Safe Browsing API (v3) Local cache

bull BlacklistsList name Description Entries

goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1

bull Does not handle URLs directly instead their SHA-256 digests

wwwevilcom SHA-256 cc7af8a31918

cc7af8a31918

32-bit prefix

bull Local cache contains prefixes 24

Clientrsquos behavior chart

Userrsquos input

URLCanonicalize andcompute digest

Foundprefix

Get fulldigests

Founddigest

MaliciousURL

Non-maliciousURL

yes

no

yes

no

25

Canonicalization and decompositions

bull Input URLhttpusrpwdabcport12extparam=1frags

bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password

bull Multiple decompositions are checked for a single URL

Decompositions of canonicalized URL

1 abc12extparam=12 abc12ext3 abc14 abc

5 bc12extparam=16 bc12ext7 bc18 bc

bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious

26

Purpose of computing decompositions

Memory saving

bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix

A more intricate example

Unsafe link Safe link

dbcebc

bcabc2

abc1abc

bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc

27

Privacy of Google Safe Browsing

ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy

Our goal A privacy analysis of Google and Yandex Safe Browsing

URL Prefix

wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3

bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set

28

Tracking Safe Browsing users

Our assumptions

bull Google and Yandex have incentives to behave maliciously

bull Wish to learn whether a user visits some selected URLs

How Google can track Safe Browsing users

bull Builds a list of prefixes to track

bull Includes these prefixes in the clientrsquos local cache

bull Learns from the requests whether a user visited a specific URL

bull Key parameter Anonymity-set size

29

Estimating anonymity-set size

bull Anonymity-set size of a prefix URLs that yield the prefix

Year URLs Domains

2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million

bull Estimate anonymity-set size Apply balls-into-bins model

Avg for URLs Avg for Domains

Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast

0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large

30

Sending multiple prefixes

Example with two prefixes

Decomposition Prefix

petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

Intuitively

bull Prefix for petsymposiumorg is not enough for re-identification

bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix

bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification

31

Ambiguity on two prefixes

bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache

URL Decomposition Prefix

Target URL abcabc A

bc B

Ambiguity

Type I gabcgabc C

abc A

bc B

Type II gbcgbc A larr collision on 32 bits

bc B

Type III defdef A larr collision on 32 bits

ef B larr collision on 32 bits

bull P[Type III] = 1264

bull Type II URLs exist only when decomp on the domain gt 232

bull P[Type I] gt P[Type II] gt P[Type III]

bull Mainly only Type I URLs create ambiguity in re-identification 32

How to track a given URL A real-world example (I)

petsymposiumorg2015

petsymposiumorg

petsymposiumorg2016linksphp

petsymposiumorg2016cfpphp

petsymposiumorg2016faqsphp

Type I URLs for target URL

petsymposiumorg2016

Goal Identify users interested in PETs

bull Target URL is petsymposiumorg2016

33

How to track a given URL A real-world example (II)

bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp

Decomposition Prefix

petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

bull Including 2 prefixes in the local cache rArr Anonymity set size of 4

bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes

bull Server receives 2 prefixes rArr visited page is the target URL

bull Server receives 3 prefixes rArr visited page is either of the leaf URLs

bull The third prefix decides which leaf URL was visited

bull Generalizable to any number of prefixes 34

Examples of URLs creating multiple hits

bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing

URL matching decomposition

Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf

17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp

httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues

1001cartesorgtag

Yandex

httpfrxhamstercomuservideofrxhamstercomxhamstercom

httpnlxhamstercomuservideonlxhamstercomxhamstercom

httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom

httpmmofoscomuserloginmmofoscommofoscom

httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom

teenslovehugecockscom

bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom

bull No need to add additional prefix for French or Dutch version35

Responsible disclosure and impact

bull Disclosure to Mozilla Firefox

ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox

bull Disclosure to Yandex

ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team

bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)

ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy

36

Conclusions amp Future Work

Conclusions

Lesson learnt Collisions are hard to tame in security and privacy

Bloom filters

bull Developers tend to ignore the worst-case of algorithms

bull Data structures with ad-hoc crypto primitives are at the best risky

bull Need of secure instantiations eg as in Count-Min sketches

Safe Browsing

bull Re-establish the weakness of anonymity-set privacy model

bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so

38

Future work

Bloom filters

bull On Bloom filters Bloom paradox [Rottenstreich 2015]

bull Beyond Bloom filters Security of Bloom filter variants

Safe Browsing

bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]

bull Privacy Can local cache improve Private Information Retrieval

39

Thank you

40

Other works not covered today

bull Performance of cryptographic accumulators

bull Private password auditing

bull (In)Security of Google and Yandex Safe Browsing

bull Alerting websites Risks and solutions

bull Decompression quines and anti-viruses

bull Pitfalls of hashing for privacy

bull Linkable (zero-knowledge) proofs for private and accountable gossip

41

Adverted privacy policy

ldquoWe collect visited web pages clickstream data or web addressaccessed browser identifier and user IDrdquo mdash WOT

ldquocollects information including IP address the origin of thesearch and may share this info with a third partyrdquo mdash Norton

Many Safe Browsing services are privacy unfriendly by design

ldquocannot determine the real URL from the informationreceivedrdquo mdash Google

bull Google seems to provide the most private service

bull Hence focus of this work

20

Google Safe Browsing When Why and How

bull When In 2008 by Google

bull Goals Protect frombull Phishing sitesbull Malware sites

bull How Easy-to-use APIs in C Python and PHP

bull Methodology Blacklists

bull Available in

bull Impactbull Billions of usersbull Detects thousands of new malicious websites per day

bull Cloned by Yandex as Yandex Safe Browsing 21

Lookup API

bull Google harvests phishing and malware URLs to feed a blacklist

bull Client checks the status using a simple HTTP GETPOST requestsb-sslgooglecomsafebrowsingapilookupexamplecom

Issues

bull Does not scale Heavy network traffic

bull Privacy URLs are sent in clear

22

Improving privacy using a local cache

Database

Server ClientLocal cache

(3) Conditionalquery

(4) Answer

(1) Query

(2) Answer

Extended client

bull Communication with the server is reduced

bull Better privacy

23

Google Safe Browsing API (v3) Local cache

bull BlacklistsList name Description Entries

goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1

bull Does not handle URLs directly instead their SHA-256 digests

wwwevilcom SHA-256 cc7af8a31918

cc7af8a31918

32-bit prefix

bull Local cache contains prefixes 24

Clientrsquos behavior chart

Userrsquos input

URLCanonicalize andcompute digest

Foundprefix

Get fulldigests

Founddigest

MaliciousURL

Non-maliciousURL

yes

no

yes

no

25

Canonicalization and decompositions

bull Input URLhttpusrpwdabcport12extparam=1frags

bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password

bull Multiple decompositions are checked for a single URL

Decompositions of canonicalized URL

1 abc12extparam=12 abc12ext3 abc14 abc

5 bc12extparam=16 bc12ext7 bc18 bc

bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious

26

Purpose of computing decompositions

Memory saving

bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix

A more intricate example

Unsafe link Safe link

dbcebc

bcabc2

abc1abc

bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc

27

Privacy of Google Safe Browsing

ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy

Our goal A privacy analysis of Google and Yandex Safe Browsing

URL Prefix

wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3

bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set

28

Tracking Safe Browsing users

Our assumptions

bull Google and Yandex have incentives to behave maliciously

bull Wish to learn whether a user visits some selected URLs

How Google can track Safe Browsing users

bull Builds a list of prefixes to track

bull Includes these prefixes in the clientrsquos local cache

bull Learns from the requests whether a user visited a specific URL

bull Key parameter Anonymity-set size

29

Estimating anonymity-set size

bull Anonymity-set size of a prefix URLs that yield the prefix

Year URLs Domains

2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million

bull Estimate anonymity-set size Apply balls-into-bins model

Avg for URLs Avg for Domains

Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast

0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large

30

Sending multiple prefixes

Example with two prefixes

Decomposition Prefix

petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

Intuitively

bull Prefix for petsymposiumorg is not enough for re-identification

bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix

bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification

31

Ambiguity on two prefixes

bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache

URL Decomposition Prefix

Target URL abcabc A

bc B

Ambiguity

Type I gabcgabc C

abc A

bc B

Type II gbcgbc A larr collision on 32 bits

bc B

Type III defdef A larr collision on 32 bits

ef B larr collision on 32 bits

bull P[Type III] = 1264

bull Type II URLs exist only when decomp on the domain gt 232

bull P[Type I] gt P[Type II] gt P[Type III]

bull Mainly only Type I URLs create ambiguity in re-identification 32

How to track a given URL A real-world example (I)

petsymposiumorg2015

petsymposiumorg

petsymposiumorg2016linksphp

petsymposiumorg2016cfpphp

petsymposiumorg2016faqsphp

Type I URLs for target URL

petsymposiumorg2016

Goal Identify users interested in PETs

bull Target URL is petsymposiumorg2016

33

How to track a given URL A real-world example (II)

bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp

Decomposition Prefix

petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

bull Including 2 prefixes in the local cache rArr Anonymity set size of 4

bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes

bull Server receives 2 prefixes rArr visited page is the target URL

bull Server receives 3 prefixes rArr visited page is either of the leaf URLs

bull The third prefix decides which leaf URL was visited

bull Generalizable to any number of prefixes 34

Examples of URLs creating multiple hits

bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing

URL matching decomposition

Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf

17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp

httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues

1001cartesorgtag

Yandex

httpfrxhamstercomuservideofrxhamstercomxhamstercom

httpnlxhamstercomuservideonlxhamstercomxhamstercom

httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom

httpmmofoscomuserloginmmofoscommofoscom

httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom

teenslovehugecockscom

bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom

bull No need to add additional prefix for French or Dutch version35

Responsible disclosure and impact

bull Disclosure to Mozilla Firefox

ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox

bull Disclosure to Yandex

ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team

bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)

ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy

36

Conclusions amp Future Work

Conclusions

Lesson learnt Collisions are hard to tame in security and privacy

Bloom filters

bull Developers tend to ignore the worst-case of algorithms

bull Data structures with ad-hoc crypto primitives are at the best risky

bull Need of secure instantiations eg as in Count-Min sketches

Safe Browsing

bull Re-establish the weakness of anonymity-set privacy model

bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so

38

Future work

Bloom filters

bull On Bloom filters Bloom paradox [Rottenstreich 2015]

bull Beyond Bloom filters Security of Bloom filter variants

Safe Browsing

bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]

bull Privacy Can local cache improve Private Information Retrieval

39

Thank you

40

Other works not covered today

bull Performance of cryptographic accumulators

bull Private password auditing

bull (In)Security of Google and Yandex Safe Browsing

bull Alerting websites Risks and solutions

bull Decompression quines and anti-viruses

bull Pitfalls of hashing for privacy

bull Linkable (zero-knowledge) proofs for private and accountable gossip

41

Google Safe Browsing When Why and How

bull When In 2008 by Google

bull Goals Protect frombull Phishing sitesbull Malware sites

bull How Easy-to-use APIs in C Python and PHP

bull Methodology Blacklists

bull Available in

bull Impactbull Billions of usersbull Detects thousands of new malicious websites per day

bull Cloned by Yandex as Yandex Safe Browsing 21

Lookup API

bull Google harvests phishing and malware URLs to feed a blacklist

bull Client checks the status using a simple HTTP GETPOST requestsb-sslgooglecomsafebrowsingapilookupexamplecom

Issues

bull Does not scale Heavy network traffic

bull Privacy URLs are sent in clear

22

Improving privacy using a local cache

Database

Server ClientLocal cache

(3) Conditionalquery

(4) Answer

(1) Query

(2) Answer

Extended client

bull Communication with the server is reduced

bull Better privacy

23

Google Safe Browsing API (v3) Local cache

bull BlacklistsList name Description Entries

goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1

bull Does not handle URLs directly instead their SHA-256 digests

wwwevilcom SHA-256 cc7af8a31918

cc7af8a31918

32-bit prefix

bull Local cache contains prefixes 24

Clientrsquos behavior chart

Userrsquos input

URLCanonicalize andcompute digest

Foundprefix

Get fulldigests

Founddigest

MaliciousURL

Non-maliciousURL

yes

no

yes

no

25

Canonicalization and decompositions

bull Input URLhttpusrpwdabcport12extparam=1frags

bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password

bull Multiple decompositions are checked for a single URL

Decompositions of canonicalized URL

1 abc12extparam=12 abc12ext3 abc14 abc

5 bc12extparam=16 bc12ext7 bc18 bc

bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious

26

Purpose of computing decompositions

Memory saving

bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix

A more intricate example

Unsafe link Safe link

dbcebc

bcabc2

abc1abc

bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc

27

Privacy of Google Safe Browsing

ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy

Our goal A privacy analysis of Google and Yandex Safe Browsing

URL Prefix

wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3

bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set

28

Tracking Safe Browsing users

Our assumptions

bull Google and Yandex have incentives to behave maliciously

bull Wish to learn whether a user visits some selected URLs

How Google can track Safe Browsing users

bull Builds a list of prefixes to track

bull Includes these prefixes in the clientrsquos local cache

bull Learns from the requests whether a user visited a specific URL

bull Key parameter Anonymity-set size

29

Estimating anonymity-set size

bull Anonymity-set size of a prefix URLs that yield the prefix

Year URLs Domains

2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million

bull Estimate anonymity-set size Apply balls-into-bins model

Avg for URLs Avg for Domains

Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast

0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large

30

Sending multiple prefixes

Example with two prefixes

Decomposition Prefix

petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

Intuitively

bull Prefix for petsymposiumorg is not enough for re-identification

bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix

bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification

31

Ambiguity on two prefixes

bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache

URL Decomposition Prefix

Target URL abcabc A

bc B

Ambiguity

Type I gabcgabc C

abc A

bc B

Type II gbcgbc A larr collision on 32 bits

bc B

Type III defdef A larr collision on 32 bits

ef B larr collision on 32 bits

bull P[Type III] = 1264

bull Type II URLs exist only when decomp on the domain gt 232

bull P[Type I] gt P[Type II] gt P[Type III]

bull Mainly only Type I URLs create ambiguity in re-identification 32

How to track a given URL A real-world example (I)

petsymposiumorg2015

petsymposiumorg

petsymposiumorg2016linksphp

petsymposiumorg2016cfpphp

petsymposiumorg2016faqsphp

Type I URLs for target URL

petsymposiumorg2016

Goal Identify users interested in PETs

bull Target URL is petsymposiumorg2016

33

How to track a given URL A real-world example (II)

bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp

Decomposition Prefix

petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

bull Including 2 prefixes in the local cache rArr Anonymity set size of 4

bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes

bull Server receives 2 prefixes rArr visited page is the target URL

bull Server receives 3 prefixes rArr visited page is either of the leaf URLs

bull The third prefix decides which leaf URL was visited

bull Generalizable to any number of prefixes 34

Examples of URLs creating multiple hits

bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing

URL matching decomposition

Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf

17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp

httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues

1001cartesorgtag

Yandex

httpfrxhamstercomuservideofrxhamstercomxhamstercom

httpnlxhamstercomuservideonlxhamstercomxhamstercom

httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom

httpmmofoscomuserloginmmofoscommofoscom

httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom

teenslovehugecockscom

bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom

bull No need to add additional prefix for French or Dutch version35

Responsible disclosure and impact

bull Disclosure to Mozilla Firefox

ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox

bull Disclosure to Yandex

ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team

bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)

ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy

36

Conclusions amp Future Work

Conclusions

Lesson learnt Collisions are hard to tame in security and privacy

Bloom filters

bull Developers tend to ignore the worst-case of algorithms

bull Data structures with ad-hoc crypto primitives are at the best risky

bull Need of secure instantiations eg as in Count-Min sketches

Safe Browsing

bull Re-establish the weakness of anonymity-set privacy model

bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so

38

Future work

Bloom filters

bull On Bloom filters Bloom paradox [Rottenstreich 2015]

bull Beyond Bloom filters Security of Bloom filter variants

Safe Browsing

bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]

bull Privacy Can local cache improve Private Information Retrieval

39

Thank you

40

Other works not covered today

bull Performance of cryptographic accumulators

bull Private password auditing

bull (In)Security of Google and Yandex Safe Browsing

bull Alerting websites Risks and solutions

bull Decompression quines and anti-viruses

bull Pitfalls of hashing for privacy

bull Linkable (zero-knowledge) proofs for private and accountable gossip

41

Lookup API

bull Google harvests phishing and malware URLs to feed a blacklist

bull Client checks the status using a simple HTTP GETPOST requestsb-sslgooglecomsafebrowsingapilookupexamplecom

Issues

bull Does not scale Heavy network traffic

bull Privacy URLs are sent in clear

22

Improving privacy using a local cache

Database

Server ClientLocal cache

(3) Conditionalquery

(4) Answer

(1) Query

(2) Answer

Extended client

bull Communication with the server is reduced

bull Better privacy

23

Google Safe Browsing API (v3) Local cache

bull BlacklistsList name Description Entries

goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1

bull Does not handle URLs directly instead their SHA-256 digests

wwwevilcom SHA-256 cc7af8a31918

cc7af8a31918

32-bit prefix

bull Local cache contains prefixes 24

Clientrsquos behavior chart

Userrsquos input

URLCanonicalize andcompute digest

Foundprefix

Get fulldigests

Founddigest

MaliciousURL

Non-maliciousURL

yes

no

yes

no

25

Canonicalization and decompositions

bull Input URLhttpusrpwdabcport12extparam=1frags

bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password

bull Multiple decompositions are checked for a single URL

Decompositions of canonicalized URL

1 abc12extparam=12 abc12ext3 abc14 abc

5 bc12extparam=16 bc12ext7 bc18 bc

bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious

26

Purpose of computing decompositions

Memory saving

bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix

A more intricate example

Unsafe link Safe link

dbcebc

bcabc2

abc1abc

bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc

27

Privacy of Google Safe Browsing

ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy

Our goal A privacy analysis of Google and Yandex Safe Browsing

URL Prefix

wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3

bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set

28

Tracking Safe Browsing users

Our assumptions

bull Google and Yandex have incentives to behave maliciously

bull Wish to learn whether a user visits some selected URLs

How Google can track Safe Browsing users

bull Builds a list of prefixes to track

bull Includes these prefixes in the clientrsquos local cache

bull Learns from the requests whether a user visited a specific URL

bull Key parameter Anonymity-set size

29

Estimating anonymity-set size

bull Anonymity-set size of a prefix URLs that yield the prefix

Year URLs Domains

2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million

bull Estimate anonymity-set size Apply balls-into-bins model

Avg for URLs Avg for Domains

Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast

0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large

30

Sending multiple prefixes

Example with two prefixes

Decomposition Prefix

petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

Intuitively

bull Prefix for petsymposiumorg is not enough for re-identification

bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix

bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification

31

Ambiguity on two prefixes

bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache

URL Decomposition Prefix

Target URL abcabc A

bc B

Ambiguity

Type I gabcgabc C

abc A

bc B

Type II gbcgbc A larr collision on 32 bits

bc B

Type III defdef A larr collision on 32 bits

ef B larr collision on 32 bits

bull P[Type III] = 1264

bull Type II URLs exist only when decomp on the domain gt 232

bull P[Type I] gt P[Type II] gt P[Type III]

bull Mainly only Type I URLs create ambiguity in re-identification 32

How to track a given URL A real-world example (I)

petsymposiumorg2015

petsymposiumorg

petsymposiumorg2016linksphp

petsymposiumorg2016cfpphp

petsymposiumorg2016faqsphp

Type I URLs for target URL

petsymposiumorg2016

Goal Identify users interested in PETs

bull Target URL is petsymposiumorg2016

33

How to track a given URL A real-world example (II)

bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp

Decomposition Prefix

petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

bull Including 2 prefixes in the local cache rArr Anonymity set size of 4

bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes

bull Server receives 2 prefixes rArr visited page is the target URL

bull Server receives 3 prefixes rArr visited page is either of the leaf URLs

bull The third prefix decides which leaf URL was visited

bull Generalizable to any number of prefixes 34

Examples of URLs creating multiple hits

bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing

URL matching decomposition

Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf

17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp

httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues

1001cartesorgtag

Yandex

httpfrxhamstercomuservideofrxhamstercomxhamstercom

httpnlxhamstercomuservideonlxhamstercomxhamstercom

httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom

httpmmofoscomuserloginmmofoscommofoscom

httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom

teenslovehugecockscom

bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom

bull No need to add additional prefix for French or Dutch version35

Responsible disclosure and impact

bull Disclosure to Mozilla Firefox

ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox

bull Disclosure to Yandex

ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team

bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)

ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy

36

Conclusions amp Future Work

Conclusions

Lesson learnt Collisions are hard to tame in security and privacy

Bloom filters

bull Developers tend to ignore the worst-case of algorithms

bull Data structures with ad-hoc crypto primitives are at the best risky

bull Need of secure instantiations eg as in Count-Min sketches

Safe Browsing

bull Re-establish the weakness of anonymity-set privacy model

bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so

38

Future work

Bloom filters

bull On Bloom filters Bloom paradox [Rottenstreich 2015]

bull Beyond Bloom filters Security of Bloom filter variants

Safe Browsing

bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]

bull Privacy Can local cache improve Private Information Retrieval

39

Thank you

40

Other works not covered today

bull Performance of cryptographic accumulators

bull Private password auditing

bull (In)Security of Google and Yandex Safe Browsing

bull Alerting websites Risks and solutions

bull Decompression quines and anti-viruses

bull Pitfalls of hashing for privacy

bull Linkable (zero-knowledge) proofs for private and accountable gossip

41

Improving privacy using a local cache

Database

Server ClientLocal cache

(3) Conditionalquery

(4) Answer

(1) Query

(2) Answer

Extended client

bull Communication with the server is reduced

bull Better privacy

23

Google Safe Browsing API (v3) Local cache

bull BlacklistsList name Description Entries

goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1

bull Does not handle URLs directly instead their SHA-256 digests

wwwevilcom SHA-256 cc7af8a31918

cc7af8a31918

32-bit prefix

bull Local cache contains prefixes 24

Clientrsquos behavior chart

Userrsquos input

URLCanonicalize andcompute digest

Foundprefix

Get fulldigests

Founddigest

MaliciousURL

Non-maliciousURL

yes

no

yes

no

25

Canonicalization and decompositions

bull Input URLhttpusrpwdabcport12extparam=1frags

bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password

bull Multiple decompositions are checked for a single URL

Decompositions of canonicalized URL

1 abc12extparam=12 abc12ext3 abc14 abc

5 bc12extparam=16 bc12ext7 bc18 bc

bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious

26

Purpose of computing decompositions

Memory saving

bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix

A more intricate example

Unsafe link Safe link

dbcebc

bcabc2

abc1abc

bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc

27

Privacy of Google Safe Browsing

ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy

Our goal A privacy analysis of Google and Yandex Safe Browsing

URL Prefix

wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3

bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set

28

Tracking Safe Browsing users

Our assumptions

bull Google and Yandex have incentives to behave maliciously

bull Wish to learn whether a user visits some selected URLs

How Google can track Safe Browsing users

bull Builds a list of prefixes to track

bull Includes these prefixes in the clientrsquos local cache

bull Learns from the requests whether a user visited a specific URL

bull Key parameter Anonymity-set size

29

Estimating anonymity-set size

bull Anonymity-set size of a prefix URLs that yield the prefix

Year URLs Domains

2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million

bull Estimate anonymity-set size Apply balls-into-bins model

Avg for URLs Avg for Domains

Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast

0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large

30

Sending multiple prefixes

Example with two prefixes

Decomposition Prefix

petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

Intuitively

bull Prefix for petsymposiumorg is not enough for re-identification

bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix

bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification

31

Ambiguity on two prefixes

bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache

URL Decomposition Prefix

Target URL abcabc A

bc B

Ambiguity

Type I gabcgabc C

abc A

bc B

Type II gbcgbc A larr collision on 32 bits

bc B

Type III defdef A larr collision on 32 bits

ef B larr collision on 32 bits

bull P[Type III] = 1264

bull Type II URLs exist only when decomp on the domain gt 232

bull P[Type I] gt P[Type II] gt P[Type III]

bull Mainly only Type I URLs create ambiguity in re-identification 32

How to track a given URL A real-world example (I)

petsymposiumorg2015

petsymposiumorg

petsymposiumorg2016linksphp

petsymposiumorg2016cfpphp

petsymposiumorg2016faqsphp

Type I URLs for target URL

petsymposiumorg2016

Goal Identify users interested in PETs

bull Target URL is petsymposiumorg2016

33

How to track a given URL A real-world example (II)

bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp

Decomposition Prefix

petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

bull Including 2 prefixes in the local cache rArr Anonymity set size of 4

bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes

bull Server receives 2 prefixes rArr visited page is the target URL

bull Server receives 3 prefixes rArr visited page is either of the leaf URLs

bull The third prefix decides which leaf URL was visited

bull Generalizable to any number of prefixes 34

Examples of URLs creating multiple hits

bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing

URL matching decomposition

Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf

17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp

httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues

1001cartesorgtag

Yandex

httpfrxhamstercomuservideofrxhamstercomxhamstercom

httpnlxhamstercomuservideonlxhamstercomxhamstercom

httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom

httpmmofoscomuserloginmmofoscommofoscom

httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom

teenslovehugecockscom

bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom

bull No need to add additional prefix for French or Dutch version35

Responsible disclosure and impact

bull Disclosure to Mozilla Firefox

ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox

bull Disclosure to Yandex

ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team

bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)

ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy

36

Conclusions amp Future Work

Conclusions

Lesson learnt Collisions are hard to tame in security and privacy

Bloom filters

bull Developers tend to ignore the worst-case of algorithms

bull Data structures with ad-hoc crypto primitives are at the best risky

bull Need of secure instantiations eg as in Count-Min sketches

Safe Browsing

bull Re-establish the weakness of anonymity-set privacy model

bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so

38

Future work

Bloom filters

bull On Bloom filters Bloom paradox [Rottenstreich 2015]

bull Beyond Bloom filters Security of Bloom filter variants

Safe Browsing

bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]

bull Privacy Can local cache improve Private Information Retrieval

39

Thank you

40

Other works not covered today

bull Performance of cryptographic accumulators

bull Private password auditing

bull (In)Security of Google and Yandex Safe Browsing

bull Alerting websites Risks and solutions

bull Decompression quines and anti-viruses

bull Pitfalls of hashing for privacy

bull Linkable (zero-knowledge) proofs for private and accountable gossip

41

Google Safe Browsing API (v3) Local cache

bull BlacklistsList name Description Entries

goog-malware-shavar malware 317807googpub-phish-shavar phishing 312621goog-regtest-shavar test file 29667goog-unwanted-shavar unwanted software goog-whitedomain-shavar unused 1

bull Does not handle URLs directly instead their SHA-256 digests

wwwevilcom SHA-256 cc7af8a31918

cc7af8a31918

32-bit prefix

bull Local cache contains prefixes 24

Clientrsquos behavior chart

Userrsquos input

URLCanonicalize andcompute digest

Foundprefix

Get fulldigests

Founddigest

MaliciousURL

Non-maliciousURL

yes

no

yes

no

25

Canonicalization and decompositions

bull Input URLhttpusrpwdabcport12extparam=1frags

bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password

bull Multiple decompositions are checked for a single URL

Decompositions of canonicalized URL

1 abc12extparam=12 abc12ext3 abc14 abc

5 bc12extparam=16 bc12ext7 bc18 bc

bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious

26

Purpose of computing decompositions

Memory saving

bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix

A more intricate example

Unsafe link Safe link

dbcebc

bcabc2

abc1abc

bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc

27

Privacy of Google Safe Browsing

ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy

Our goal A privacy analysis of Google and Yandex Safe Browsing

URL Prefix

wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3

bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set

28

Tracking Safe Browsing users

Our assumptions

bull Google and Yandex have incentives to behave maliciously

bull Wish to learn whether a user visits some selected URLs

How Google can track Safe Browsing users

bull Builds a list of prefixes to track

bull Includes these prefixes in the clientrsquos local cache

bull Learns from the requests whether a user visited a specific URL

bull Key parameter Anonymity-set size

29

Estimating anonymity-set size

bull Anonymity-set size of a prefix URLs that yield the prefix

Year URLs Domains

2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million

bull Estimate anonymity-set size Apply balls-into-bins model

Avg for URLs Avg for Domains

Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast

0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large

30

Sending multiple prefixes

Example with two prefixes

Decomposition Prefix

petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

Intuitively

bull Prefix for petsymposiumorg is not enough for re-identification

bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix

bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification

31

Ambiguity on two prefixes

bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache

URL Decomposition Prefix

Target URL abcabc A

bc B

Ambiguity

Type I gabcgabc C

abc A

bc B

Type II gbcgbc A larr collision on 32 bits

bc B

Type III defdef A larr collision on 32 bits

ef B larr collision on 32 bits

bull P[Type III] = 1264

bull Type II URLs exist only when decomp on the domain gt 232

bull P[Type I] gt P[Type II] gt P[Type III]

bull Mainly only Type I URLs create ambiguity in re-identification 32

How to track a given URL A real-world example (I)

petsymposiumorg2015

petsymposiumorg

petsymposiumorg2016linksphp

petsymposiumorg2016cfpphp

petsymposiumorg2016faqsphp

Type I URLs for target URL

petsymposiumorg2016

Goal Identify users interested in PETs

bull Target URL is petsymposiumorg2016

33

How to track a given URL A real-world example (II)

bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp

Decomposition Prefix

petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

bull Including 2 prefixes in the local cache rArr Anonymity set size of 4

bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes

bull Server receives 2 prefixes rArr visited page is the target URL

bull Server receives 3 prefixes rArr visited page is either of the leaf URLs

bull The third prefix decides which leaf URL was visited

bull Generalizable to any number of prefixes 34

Examples of URLs creating multiple hits

bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing

URL matching decomposition

Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf

17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp

httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues

1001cartesorgtag

Yandex

httpfrxhamstercomuservideofrxhamstercomxhamstercom

httpnlxhamstercomuservideonlxhamstercomxhamstercom

httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom

httpmmofoscomuserloginmmofoscommofoscom

httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom

teenslovehugecockscom

bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom

bull No need to add additional prefix for French or Dutch version35

Responsible disclosure and impact

bull Disclosure to Mozilla Firefox

ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox

bull Disclosure to Yandex

ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team

bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)

ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy

36

Conclusions amp Future Work

Conclusions

Lesson learnt Collisions are hard to tame in security and privacy

Bloom filters

bull Developers tend to ignore the worst-case of algorithms

bull Data structures with ad-hoc crypto primitives are at the best risky

bull Need of secure instantiations eg as in Count-Min sketches

Safe Browsing

bull Re-establish the weakness of anonymity-set privacy model

bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so

38

Future work

Bloom filters

bull On Bloom filters Bloom paradox [Rottenstreich 2015]

bull Beyond Bloom filters Security of Bloom filter variants

Safe Browsing

bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]

bull Privacy Can local cache improve Private Information Retrieval

39

Thank you

40

Other works not covered today

bull Performance of cryptographic accumulators

bull Private password auditing

bull (In)Security of Google and Yandex Safe Browsing

bull Alerting websites Risks and solutions

bull Decompression quines and anti-viruses

bull Pitfalls of hashing for privacy

bull Linkable (zero-knowledge) proofs for private and accountable gossip

41

Clientrsquos behavior chart

Userrsquos input

URLCanonicalize andcompute digest

Foundprefix

Get fulldigests

Founddigest

MaliciousURL

Non-maliciousURL

yes

no

yes

no

25

Canonicalization and decompositions

bull Input URLhttpusrpwdabcport12extparam=1frags

bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password

bull Multiple decompositions are checked for a single URL

Decompositions of canonicalized URL

1 abc12extparam=12 abc12ext3 abc14 abc

5 bc12extparam=16 bc12ext7 bc18 bc

bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious

26

Purpose of computing decompositions

Memory saving

bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix

A more intricate example

Unsafe link Safe link

dbcebc

bcabc2

abc1abc

bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc

27

Privacy of Google Safe Browsing

ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy

Our goal A privacy analysis of Google and Yandex Safe Browsing

URL Prefix

wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3

bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set

28

Tracking Safe Browsing users

Our assumptions

bull Google and Yandex have incentives to behave maliciously

bull Wish to learn whether a user visits some selected URLs

How Google can track Safe Browsing users

bull Builds a list of prefixes to track

bull Includes these prefixes in the clientrsquos local cache

bull Learns from the requests whether a user visited a specific URL

bull Key parameter Anonymity-set size

29

Estimating anonymity-set size

bull Anonymity-set size of a prefix URLs that yield the prefix

Year URLs Domains

2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million

bull Estimate anonymity-set size Apply balls-into-bins model

Avg for URLs Avg for Domains

Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast

0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large

30

Sending multiple prefixes

Example with two prefixes

Decomposition Prefix

petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

Intuitively

bull Prefix for petsymposiumorg is not enough for re-identification

bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix

bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification

31

Ambiguity on two prefixes

bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache

URL Decomposition Prefix

Target URL abcabc A

bc B

Ambiguity

Type I gabcgabc C

abc A

bc B

Type II gbcgbc A larr collision on 32 bits

bc B

Type III defdef A larr collision on 32 bits

ef B larr collision on 32 bits

bull P[Type III] = 1264

bull Type II URLs exist only when decomp on the domain gt 232

bull P[Type I] gt P[Type II] gt P[Type III]

bull Mainly only Type I URLs create ambiguity in re-identification 32

How to track a given URL A real-world example (I)

petsymposiumorg2015

petsymposiumorg

petsymposiumorg2016linksphp

petsymposiumorg2016cfpphp

petsymposiumorg2016faqsphp

Type I URLs for target URL

petsymposiumorg2016

Goal Identify users interested in PETs

bull Target URL is petsymposiumorg2016

33

How to track a given URL A real-world example (II)

bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp

Decomposition Prefix

petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

bull Including 2 prefixes in the local cache rArr Anonymity set size of 4

bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes

bull Server receives 2 prefixes rArr visited page is the target URL

bull Server receives 3 prefixes rArr visited page is either of the leaf URLs

bull The third prefix decides which leaf URL was visited

bull Generalizable to any number of prefixes 34

Examples of URLs creating multiple hits

bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing

URL matching decomposition

Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf

17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp

httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues

1001cartesorgtag

Yandex

httpfrxhamstercomuservideofrxhamstercomxhamstercom

httpnlxhamstercomuservideonlxhamstercomxhamstercom

httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom

httpmmofoscomuserloginmmofoscommofoscom

httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom

teenslovehugecockscom

bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom

bull No need to add additional prefix for French or Dutch version35

Responsible disclosure and impact

bull Disclosure to Mozilla Firefox

ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox

bull Disclosure to Yandex

ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team

bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)

ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy

36

Conclusions amp Future Work

Conclusions

Lesson learnt Collisions are hard to tame in security and privacy

Bloom filters

bull Developers tend to ignore the worst-case of algorithms

bull Data structures with ad-hoc crypto primitives are at the best risky

bull Need of secure instantiations eg as in Count-Min sketches

Safe Browsing

bull Re-establish the weakness of anonymity-set privacy model

bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so

38

Future work

Bloom filters

bull On Bloom filters Bloom paradox [Rottenstreich 2015]

bull Beyond Bloom filters Security of Bloom filter variants

Safe Browsing

bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]

bull Privacy Can local cache improve Private Information Retrieval

39

Thank you

40

Other works not covered today

bull Performance of cryptographic accumulators

bull Private password auditing

bull (In)Security of Google and Yandex Safe Browsing

bull Alerting websites Risks and solutions

bull Decompression quines and anti-viruses

bull Pitfalls of hashing for privacy

bull Linkable (zero-knowledge) proofs for private and accountable gossip

41

Canonicalization and decompositions

bull Input URLhttpusrpwdabcport12extparam=1frags

bull Canonicalize(Input URL) rarr httpabc12extparam=1bull Canonicalization for privacy too Removes username and password

bull Multiple decompositions are checked for a single URL

Decompositions of canonicalized URL

1 abc12extparam=12 abc12ext3 abc14 abc

5 bc12extparam=16 bc12ext7 bc18 bc

bull Each matching prefix is sent to the serverbull Any matching full digest rArr Initial URL is malicious

26

Purpose of computing decompositions

Memory saving

bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix

A more intricate example

Unsafe link Safe link

dbcebc

bcabc2

abc1abc

bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc

27

Privacy of Google Safe Browsing

ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy

Our goal A privacy analysis of Google and Yandex Safe Browsing

URL Prefix

wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3

bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set

28

Tracking Safe Browsing users

Our assumptions

bull Google and Yandex have incentives to behave maliciously

bull Wish to learn whether a user visits some selected URLs

How Google can track Safe Browsing users

bull Builds a list of prefixes to track

bull Includes these prefixes in the clientrsquos local cache

bull Learns from the requests whether a user visited a specific URL

bull Key parameter Anonymity-set size

29

Estimating anonymity-set size

bull Anonymity-set size of a prefix URLs that yield the prefix

Year URLs Domains

2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million

bull Estimate anonymity-set size Apply balls-into-bins model

Avg for URLs Avg for Domains

Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast

0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large

30

Sending multiple prefixes

Example with two prefixes

Decomposition Prefix

petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

Intuitively

bull Prefix for petsymposiumorg is not enough for re-identification

bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix

bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification

31

Ambiguity on two prefixes

bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache

URL Decomposition Prefix

Target URL abcabc A

bc B

Ambiguity

Type I gabcgabc C

abc A

bc B

Type II gbcgbc A larr collision on 32 bits

bc B

Type III defdef A larr collision on 32 bits

ef B larr collision on 32 bits

bull P[Type III] = 1264

bull Type II URLs exist only when decomp on the domain gt 232

bull P[Type I] gt P[Type II] gt P[Type III]

bull Mainly only Type I URLs create ambiguity in re-identification 32

How to track a given URL A real-world example (I)

petsymposiumorg2015

petsymposiumorg

petsymposiumorg2016linksphp

petsymposiumorg2016cfpphp

petsymposiumorg2016faqsphp

Type I URLs for target URL

petsymposiumorg2016

Goal Identify users interested in PETs

bull Target URL is petsymposiumorg2016

33

How to track a given URL A real-world example (II)

bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp

Decomposition Prefix

petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

bull Including 2 prefixes in the local cache rArr Anonymity set size of 4

bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes

bull Server receives 2 prefixes rArr visited page is the target URL

bull Server receives 3 prefixes rArr visited page is either of the leaf URLs

bull The third prefix decides which leaf URL was visited

bull Generalizable to any number of prefixes 34

Examples of URLs creating multiple hits

bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing

URL matching decomposition

Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf

17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp

httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues

1001cartesorgtag

Yandex

httpfrxhamstercomuservideofrxhamstercomxhamstercom

httpnlxhamstercomuservideonlxhamstercomxhamstercom

httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom

httpmmofoscomuserloginmmofoscommofoscom

httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom

teenslovehugecockscom

bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom

bull No need to add additional prefix for French or Dutch version35

Responsible disclosure and impact

bull Disclosure to Mozilla Firefox

ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox

bull Disclosure to Yandex

ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team

bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)

ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy

36

Conclusions amp Future Work

Conclusions

Lesson learnt Collisions are hard to tame in security and privacy

Bloom filters

bull Developers tend to ignore the worst-case of algorithms

bull Data structures with ad-hoc crypto primitives are at the best risky

bull Need of secure instantiations eg as in Count-Min sketches

Safe Browsing

bull Re-establish the weakness of anonymity-set privacy model

bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so

38

Future work

Bloom filters

bull On Bloom filters Bloom paradox [Rottenstreich 2015]

bull Beyond Bloom filters Security of Bloom filter variants

Safe Browsing

bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]

bull Privacy Can local cache improve Private Information Retrieval

39

Thank you

40

Other works not covered today

bull Performance of cryptographic accumulators

bull Private password auditing

bull (In)Security of Google and Yandex Safe Browsing

bull Alerting websites Risks and solutions

bull Decompression quines and anti-viruses

bull Pitfalls of hashing for privacy

bull Linkable (zero-knowledge) proofs for private and accountable gossip

41

Purpose of computing decompositions

Memory saving

bull A domain which hosts only malicious URLsbull Naive blacklisting Include all malicious prefixes in the local cachebull Memory-efficient blacklisting Include only the domain prefix

A more intricate example

Unsafe link Safe link

dbcebc

bcabc2

abc1abc

bull Naive blacklisting Include abc1 abc2 and dbcbull Memory-efficient blacklisting Include only abc and dbc

27

Privacy of Google Safe Browsing

ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy

Our goal A privacy analysis of Google and Yandex Safe Browsing

URL Prefix

wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3

bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set

28

Tracking Safe Browsing users

Our assumptions

bull Google and Yandex have incentives to behave maliciously

bull Wish to learn whether a user visits some selected URLs

How Google can track Safe Browsing users

bull Builds a list of prefixes to track

bull Includes these prefixes in the clientrsquos local cache

bull Learns from the requests whether a user visited a specific URL

bull Key parameter Anonymity-set size

29

Estimating anonymity-set size

bull Anonymity-set size of a prefix URLs that yield the prefix

Year URLs Domains

2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million

bull Estimate anonymity-set size Apply balls-into-bins model

Avg for URLs Avg for Domains

Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast

0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large

30

Sending multiple prefixes

Example with two prefixes

Decomposition Prefix

petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

Intuitively

bull Prefix for petsymposiumorg is not enough for re-identification

bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix

bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification

31

Ambiguity on two prefixes

bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache

URL Decomposition Prefix

Target URL abcabc A

bc B

Ambiguity

Type I gabcgabc C

abc A

bc B

Type II gbcgbc A larr collision on 32 bits

bc B

Type III defdef A larr collision on 32 bits

ef B larr collision on 32 bits

bull P[Type III] = 1264

bull Type II URLs exist only when decomp on the domain gt 232

bull P[Type I] gt P[Type II] gt P[Type III]

bull Mainly only Type I URLs create ambiguity in re-identification 32

How to track a given URL A real-world example (I)

petsymposiumorg2015

petsymposiumorg

petsymposiumorg2016linksphp

petsymposiumorg2016cfpphp

petsymposiumorg2016faqsphp

Type I URLs for target URL

petsymposiumorg2016

Goal Identify users interested in PETs

bull Target URL is petsymposiumorg2016

33

How to track a given URL A real-world example (II)

bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp

Decomposition Prefix

petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

bull Including 2 prefixes in the local cache rArr Anonymity set size of 4

bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes

bull Server receives 2 prefixes rArr visited page is the target URL

bull Server receives 3 prefixes rArr visited page is either of the leaf URLs

bull The third prefix decides which leaf URL was visited

bull Generalizable to any number of prefixes 34

Examples of URLs creating multiple hits

bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing

URL matching decomposition

Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf

17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp

httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues

1001cartesorgtag

Yandex

httpfrxhamstercomuservideofrxhamstercomxhamstercom

httpnlxhamstercomuservideonlxhamstercomxhamstercom

httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom

httpmmofoscomuserloginmmofoscommofoscom

httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom

teenslovehugecockscom

bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom

bull No need to add additional prefix for French or Dutch version35

Responsible disclosure and impact

bull Disclosure to Mozilla Firefox

ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox

bull Disclosure to Yandex

ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team

bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)

ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy

36

Conclusions amp Future Work

Conclusions

Lesson learnt Collisions are hard to tame in security and privacy

Bloom filters

bull Developers tend to ignore the worst-case of algorithms

bull Data structures with ad-hoc crypto primitives are at the best risky

bull Need of secure instantiations eg as in Count-Min sketches

Safe Browsing

bull Re-establish the weakness of anonymity-set privacy model

bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so

38

Future work

Bloom filters

bull On Bloom filters Bloom paradox [Rottenstreich 2015]

bull Beyond Bloom filters Security of Bloom filter variants

Safe Browsing

bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]

bull Privacy Can local cache improve Private Information Retrieval

39

Thank you

40

Other works not covered today

bull Performance of cryptographic accumulators

bull Private password auditing

bull (In)Security of Google and Yandex Safe Browsing

bull Alerting websites Risks and solutions

bull Decompression quines and anti-viruses

bull Pitfalls of hashing for privacy

bull Linkable (zero-knowledge) proofs for private and accountable gossip

41

Privacy of Google Safe Browsing

ldquoGoogle cannot determine the real URL from the informationreceivedrdquo mdash Google Safe Browsing v3 privacy policy

Our goal A privacy analysis of Google and Yandex Safe Browsing

URL Prefix

wwwevilcom cc7af8a3wwwexample-1com11893474 cc7af8a3wwwexample-2com5234456210 cc7af8a3wwwexample-3com616445242 cc7af8a3

bull Privacy due to anonymity-setbull Estimate the anonymity-set sizebull Does it suffice to have a large anonymity-set

28

Tracking Safe Browsing users

Our assumptions

bull Google and Yandex have incentives to behave maliciously

bull Wish to learn whether a user visits some selected URLs

How Google can track Safe Browsing users

bull Builds a list of prefixes to track

bull Includes these prefixes in the clientrsquos local cache

bull Learns from the requests whether a user visited a specific URL

bull Key parameter Anonymity-set size

29

Estimating anonymity-set size

bull Anonymity-set size of a prefix URLs that yield the prefix

Year URLs Domains

2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million

bull Estimate anonymity-set size Apply balls-into-bins model

Avg for URLs Avg for Domains

Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast

0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large

30

Sending multiple prefixes

Example with two prefixes

Decomposition Prefix

petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

Intuitively

bull Prefix for petsymposiumorg is not enough for re-identification

bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix

bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification

31

Ambiguity on two prefixes

bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache

URL Decomposition Prefix

Target URL abcabc A

bc B

Ambiguity

Type I gabcgabc C

abc A

bc B

Type II gbcgbc A larr collision on 32 bits

bc B

Type III defdef A larr collision on 32 bits

ef B larr collision on 32 bits

bull P[Type III] = 1264

bull Type II URLs exist only when decomp on the domain gt 232

bull P[Type I] gt P[Type II] gt P[Type III]

bull Mainly only Type I URLs create ambiguity in re-identification 32

How to track a given URL A real-world example (I)

petsymposiumorg2015

petsymposiumorg

petsymposiumorg2016linksphp

petsymposiumorg2016cfpphp

petsymposiumorg2016faqsphp

Type I URLs for target URL

petsymposiumorg2016

Goal Identify users interested in PETs

bull Target URL is petsymposiumorg2016

33

How to track a given URL A real-world example (II)

bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp

Decomposition Prefix

petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

bull Including 2 prefixes in the local cache rArr Anonymity set size of 4

bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes

bull Server receives 2 prefixes rArr visited page is the target URL

bull Server receives 3 prefixes rArr visited page is either of the leaf URLs

bull The third prefix decides which leaf URL was visited

bull Generalizable to any number of prefixes 34

Examples of URLs creating multiple hits

bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing

URL matching decomposition

Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf

17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp

httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues

1001cartesorgtag

Yandex

httpfrxhamstercomuservideofrxhamstercomxhamstercom

httpnlxhamstercomuservideonlxhamstercomxhamstercom

httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom

httpmmofoscomuserloginmmofoscommofoscom

httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom

teenslovehugecockscom

bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom

bull No need to add additional prefix for French or Dutch version35

Responsible disclosure and impact

bull Disclosure to Mozilla Firefox

ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox

bull Disclosure to Yandex

ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team

bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)

ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy

36

Conclusions amp Future Work

Conclusions

Lesson learnt Collisions are hard to tame in security and privacy

Bloom filters

bull Developers tend to ignore the worst-case of algorithms

bull Data structures with ad-hoc crypto primitives are at the best risky

bull Need of secure instantiations eg as in Count-Min sketches

Safe Browsing

bull Re-establish the weakness of anonymity-set privacy model

bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so

38

Future work

Bloom filters

bull On Bloom filters Bloom paradox [Rottenstreich 2015]

bull Beyond Bloom filters Security of Bloom filter variants

Safe Browsing

bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]

bull Privacy Can local cache improve Private Information Retrieval

39

Thank you

40

Other works not covered today

bull Performance of cryptographic accumulators

bull Private password auditing

bull (In)Security of Google and Yandex Safe Browsing

bull Alerting websites Risks and solutions

bull Decompression quines and anti-viruses

bull Pitfalls of hashing for privacy

bull Linkable (zero-knowledge) proofs for private and accountable gossip

41

Tracking Safe Browsing users

Our assumptions

bull Google and Yandex have incentives to behave maliciously

bull Wish to learn whether a user visits some selected URLs

How Google can track Safe Browsing users

bull Builds a list of prefixes to track

bull Includes these prefixes in the clientrsquos local cache

bull Learns from the requests whether a user visited a specific URL

bull Key parameter Anonymity-set size

29

Estimating anonymity-set size

bull Anonymity-set size of a prefix URLs that yield the prefix

Year URLs Domains

2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million

bull Estimate anonymity-set size Apply balls-into-bins model

Avg for URLs Avg for Domains

Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast

0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large

30

Sending multiple prefixes

Example with two prefixes

Decomposition Prefix

petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

Intuitively

bull Prefix for petsymposiumorg is not enough for re-identification

bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix

bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification

31

Ambiguity on two prefixes

bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache

URL Decomposition Prefix

Target URL abcabc A

bc B

Ambiguity

Type I gabcgabc C

abc A

bc B

Type II gbcgbc A larr collision on 32 bits

bc B

Type III defdef A larr collision on 32 bits

ef B larr collision on 32 bits

bull P[Type III] = 1264

bull Type II URLs exist only when decomp on the domain gt 232

bull P[Type I] gt P[Type II] gt P[Type III]

bull Mainly only Type I URLs create ambiguity in re-identification 32

How to track a given URL A real-world example (I)

petsymposiumorg2015

petsymposiumorg

petsymposiumorg2016linksphp

petsymposiumorg2016cfpphp

petsymposiumorg2016faqsphp

Type I URLs for target URL

petsymposiumorg2016

Goal Identify users interested in PETs

bull Target URL is petsymposiumorg2016

33

How to track a given URL A real-world example (II)

bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp

Decomposition Prefix

petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

bull Including 2 prefixes in the local cache rArr Anonymity set size of 4

bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes

bull Server receives 2 prefixes rArr visited page is the target URL

bull Server receives 3 prefixes rArr visited page is either of the leaf URLs

bull The third prefix decides which leaf URL was visited

bull Generalizable to any number of prefixes 34

Examples of URLs creating multiple hits

bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing

URL matching decomposition

Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf

17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp

httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues

1001cartesorgtag

Yandex

httpfrxhamstercomuservideofrxhamstercomxhamstercom

httpnlxhamstercomuservideonlxhamstercomxhamstercom

httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom

httpmmofoscomuserloginmmofoscommofoscom

httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom

teenslovehugecockscom

bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom

bull No need to add additional prefix for French or Dutch version35

Responsible disclosure and impact

bull Disclosure to Mozilla Firefox

ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox

bull Disclosure to Yandex

ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team

bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)

ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy

36

Conclusions amp Future Work

Conclusions

Lesson learnt Collisions are hard to tame in security and privacy

Bloom filters

bull Developers tend to ignore the worst-case of algorithms

bull Data structures with ad-hoc crypto primitives are at the best risky

bull Need of secure instantiations eg as in Count-Min sketches

Safe Browsing

bull Re-establish the weakness of anonymity-set privacy model

bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so

38

Future work

Bloom filters

bull On Bloom filters Bloom paradox [Rottenstreich 2015]

bull Beyond Bloom filters Security of Bloom filter variants

Safe Browsing

bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]

bull Privacy Can local cache improve Private Information Retrieval

39

Thank you

40

Other works not covered today

bull Performance of cryptographic accumulators

bull Private password auditing

bull (In)Security of Google and Yandex Safe Browsing

bull Alerting websites Risks and solutions

bull Decompression quines and anti-viruses

bull Pitfalls of hashing for privacy

bull Linkable (zero-knowledge) proofs for private and accountable gossip

41

Estimating anonymity-set size

bull Anonymity-set size of a prefix URLs that yield the prefix

Year URLs Domains

2008 1 Billion 177 Million2012 30 Billion 252 Million2013 60 Billion 271 Million

bull Estimate anonymity-set size Apply balls-into-bins model

Avg for URLs Avg for Domains

Prefix length (bits) 2008 2012 2013 2008 2012 201316 223 228 229 2700 3845 413532 232 6984 13969 004 005 00664 0lowast 0lowast 0lowast 0lowast 0lowast 0lowast

0lowast is very close to 0bull Domains and URLs cannot be distinguishedbull Anonymity-set size seems to be large

30

Sending multiple prefixes

Example with two prefixes

Decomposition Prefix

petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

Intuitively

bull Prefix for petsymposiumorg is not enough for re-identification

bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix

bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification

31

Ambiguity on two prefixes

bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache

URL Decomposition Prefix

Target URL abcabc A

bc B

Ambiguity

Type I gabcgabc C

abc A

bc B

Type II gbcgbc A larr collision on 32 bits

bc B

Type III defdef A larr collision on 32 bits

ef B larr collision on 32 bits

bull P[Type III] = 1264

bull Type II URLs exist only when decomp on the domain gt 232

bull P[Type I] gt P[Type II] gt P[Type III]

bull Mainly only Type I URLs create ambiguity in re-identification 32

How to track a given URL A real-world example (I)

petsymposiumorg2015

petsymposiumorg

petsymposiumorg2016linksphp

petsymposiumorg2016cfpphp

petsymposiumorg2016faqsphp

Type I URLs for target URL

petsymposiumorg2016

Goal Identify users interested in PETs

bull Target URL is petsymposiumorg2016

33

How to track a given URL A real-world example (II)

bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp

Decomposition Prefix

petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

bull Including 2 prefixes in the local cache rArr Anonymity set size of 4

bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes

bull Server receives 2 prefixes rArr visited page is the target URL

bull Server receives 3 prefixes rArr visited page is either of the leaf URLs

bull The third prefix decides which leaf URL was visited

bull Generalizable to any number of prefixes 34

Examples of URLs creating multiple hits

bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing

URL matching decomposition

Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf

17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp

httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues

1001cartesorgtag

Yandex

httpfrxhamstercomuservideofrxhamstercomxhamstercom

httpnlxhamstercomuservideonlxhamstercomxhamstercom

httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom

httpmmofoscomuserloginmmofoscommofoscom

httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom

teenslovehugecockscom

bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom

bull No need to add additional prefix for French or Dutch version35

Responsible disclosure and impact

bull Disclosure to Mozilla Firefox

ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox

bull Disclosure to Yandex

ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team

bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)

ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy

36

Conclusions amp Future Work

Conclusions

Lesson learnt Collisions are hard to tame in security and privacy

Bloom filters

bull Developers tend to ignore the worst-case of algorithms

bull Data structures with ad-hoc crypto primitives are at the best risky

bull Need of secure instantiations eg as in Count-Min sketches

Safe Browsing

bull Re-establish the weakness of anonymity-set privacy model

bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so

38

Future work

Bloom filters

bull On Bloom filters Bloom paradox [Rottenstreich 2015]

bull Beyond Bloom filters Security of Bloom filter variants

Safe Browsing

bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]

bull Privacy Can local cache improve Private Information Retrieval

39

Thank you

40

Other works not covered today

bull Performance of cryptographic accumulators

bull Private password auditing

bull (In)Security of Google and Yandex Safe Browsing

bull Alerting websites Risks and solutions

bull Decompression quines and anti-viruses

bull Pitfalls of hashing for privacy

bull Linkable (zero-knowledge) proofs for private and accountable gossip

41

Sending multiple prefixes

Example with two prefixes

Decomposition Prefix

petsymposiumorg2016cfpphp 0xe70ee6d1petsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

Intuitively

bull Prefix for petsymposiumorg is not enough for re-identification

bull Sending two 32-bit prefixes for petsymposiumorg andpetsymposiumorg2016 asymp sending one 64-bit prefix

bull The maximum anonymity-set size for 64-bit prefixes is 1rArr Should lead to re-identification

31

Ambiguity on two prefixes

bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache

URL Decomposition Prefix

Target URL abcabc A

bc B

Ambiguity

Type I gabcgabc C

abc A

bc B

Type II gbcgbc A larr collision on 32 bits

bc B

Type III defdef A larr collision on 32 bits

ef B larr collision on 32 bits

bull P[Type III] = 1264

bull Type II URLs exist only when decomp on the domain gt 232

bull P[Type I] gt P[Type II] gt P[Type III]

bull Mainly only Type I URLs create ambiguity in re-identification 32

How to track a given URL A real-world example (I)

petsymposiumorg2015

petsymposiumorg

petsymposiumorg2016linksphp

petsymposiumorg2016cfpphp

petsymposiumorg2016faqsphp

Type I URLs for target URL

petsymposiumorg2016

Goal Identify users interested in PETs

bull Target URL is petsymposiumorg2016

33

How to track a given URL A real-world example (II)

bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp

Decomposition Prefix

petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

bull Including 2 prefixes in the local cache rArr Anonymity set size of 4

bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes

bull Server receives 2 prefixes rArr visited page is the target URL

bull Server receives 3 prefixes rArr visited page is either of the leaf URLs

bull The third prefix decides which leaf URL was visited

bull Generalizable to any number of prefixes 34

Examples of URLs creating multiple hits

bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing

URL matching decomposition

Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf

17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp

httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues

1001cartesorgtag

Yandex

httpfrxhamstercomuservideofrxhamstercomxhamstercom

httpnlxhamstercomuservideonlxhamstercomxhamstercom

httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom

httpmmofoscomuserloginmmofoscommofoscom

httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom

teenslovehugecockscom

bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom

bull No need to add additional prefix for French or Dutch version35

Responsible disclosure and impact

bull Disclosure to Mozilla Firefox

ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox

bull Disclosure to Yandex

ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team

bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)

ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy

36

Conclusions amp Future Work

Conclusions

Lesson learnt Collisions are hard to tame in security and privacy

Bloom filters

bull Developers tend to ignore the worst-case of algorithms

bull Data structures with ad-hoc crypto primitives are at the best risky

bull Need of secure instantiations eg as in Count-Min sketches

Safe Browsing

bull Re-establish the weakness of anonymity-set privacy model

bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so

38

Future work

Bloom filters

bull On Bloom filters Bloom paradox [Rottenstreich 2015]

bull Beyond Bloom filters Security of Bloom filter variants

Safe Browsing

bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]

bull Privacy Can local cache improve Private Information Retrieval

39

Thank you

40

Other works not covered today

bull Performance of cryptographic accumulators

bull Private password auditing

bull (In)Security of Google and Yandex Safe Browsing

bull Alerting websites Risks and solutions

bull Decompression quines and anti-viruses

bull Pitfalls of hashing for privacy

bull Linkable (zero-knowledge) proofs for private and accountable gossip

41

Ambiguity on two prefixes

bull More than two distinct URLs may yield the same two prefixesbull Consider a user visiting abc with prefixes A and B in local cache

URL Decomposition Prefix

Target URL abcabc A

bc B

Ambiguity

Type I gabcgabc C

abc A

bc B

Type II gbcgbc A larr collision on 32 bits

bc B

Type III defdef A larr collision on 32 bits

ef B larr collision on 32 bits

bull P[Type III] = 1264

bull Type II URLs exist only when decomp on the domain gt 232

bull P[Type I] gt P[Type II] gt P[Type III]

bull Mainly only Type I URLs create ambiguity in re-identification 32

How to track a given URL A real-world example (I)

petsymposiumorg2015

petsymposiumorg

petsymposiumorg2016linksphp

petsymposiumorg2016cfpphp

petsymposiumorg2016faqsphp

Type I URLs for target URL

petsymposiumorg2016

Goal Identify users interested in PETs

bull Target URL is petsymposiumorg2016

33

How to track a given URL A real-world example (II)

bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp

Decomposition Prefix

petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

bull Including 2 prefixes in the local cache rArr Anonymity set size of 4

bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes

bull Server receives 2 prefixes rArr visited page is the target URL

bull Server receives 3 prefixes rArr visited page is either of the leaf URLs

bull The third prefix decides which leaf URL was visited

bull Generalizable to any number of prefixes 34

Examples of URLs creating multiple hits

bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing

URL matching decomposition

Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf

17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp

httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues

1001cartesorgtag

Yandex

httpfrxhamstercomuservideofrxhamstercomxhamstercom

httpnlxhamstercomuservideonlxhamstercomxhamstercom

httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom

httpmmofoscomuserloginmmofoscommofoscom

httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom

teenslovehugecockscom

bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom

bull No need to add additional prefix for French or Dutch version35

Responsible disclosure and impact

bull Disclosure to Mozilla Firefox

ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox

bull Disclosure to Yandex

ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team

bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)

ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy

36

Conclusions amp Future Work

Conclusions

Lesson learnt Collisions are hard to tame in security and privacy

Bloom filters

bull Developers tend to ignore the worst-case of algorithms

bull Data structures with ad-hoc crypto primitives are at the best risky

bull Need of secure instantiations eg as in Count-Min sketches

Safe Browsing

bull Re-establish the weakness of anonymity-set privacy model

bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so

38

Future work

Bloom filters

bull On Bloom filters Bloom paradox [Rottenstreich 2015]

bull Beyond Bloom filters Security of Bloom filter variants

Safe Browsing

bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]

bull Privacy Can local cache improve Private Information Retrieval

39

Thank you

40

Other works not covered today

bull Performance of cryptographic accumulators

bull Private password auditing

bull (In)Security of Google and Yandex Safe Browsing

bull Alerting websites Risks and solutions

bull Decompression quines and anti-viruses

bull Pitfalls of hashing for privacy

bull Linkable (zero-knowledge) proofs for private and accountable gossip

41

How to track a given URL A real-world example (I)

petsymposiumorg2015

petsymposiumorg

petsymposiumorg2016linksphp

petsymposiumorg2016cfpphp

petsymposiumorg2016faqsphp

Type I URLs for target URL

petsymposiumorg2016

Goal Identify users interested in PETs

bull Target URL is petsymposiumorg2016

33

How to track a given URL A real-world example (II)

bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp

Decomposition Prefix

petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

bull Including 2 prefixes in the local cache rArr Anonymity set size of 4

bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes

bull Server receives 2 prefixes rArr visited page is the target URL

bull Server receives 3 prefixes rArr visited page is either of the leaf URLs

bull The third prefix decides which leaf URL was visited

bull Generalizable to any number of prefixes 34

Examples of URLs creating multiple hits

bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing

URL matching decomposition

Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf

17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp

httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues

1001cartesorgtag

Yandex

httpfrxhamstercomuservideofrxhamstercomxhamstercom

httpnlxhamstercomuservideonlxhamstercomxhamstercom

httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom

httpmmofoscomuserloginmmofoscommofoscom

httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom

teenslovehugecockscom

bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom

bull No need to add additional prefix for French or Dutch version35

Responsible disclosure and impact

bull Disclosure to Mozilla Firefox

ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox

bull Disclosure to Yandex

ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team

bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)

ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy

36

Conclusions amp Future Work

Conclusions

Lesson learnt Collisions are hard to tame in security and privacy

Bloom filters

bull Developers tend to ignore the worst-case of algorithms

bull Data structures with ad-hoc crypto primitives are at the best risky

bull Need of secure instantiations eg as in Count-Min sketches

Safe Browsing

bull Re-establish the weakness of anonymity-set privacy model

bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so

38

Future work

Bloom filters

bull On Bloom filters Bloom paradox [Rottenstreich 2015]

bull Beyond Bloom filters Security of Bloom filter variants

Safe Browsing

bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]

bull Privacy Can local cache improve Private Information Retrieval

39

Thank you

40

Other works not covered today

bull Performance of cryptographic accumulators

bull Private password auditing

bull (In)Security of Google and Yandex Safe Browsing

bull Alerting websites Risks and solutions

bull Decompression quines and anti-viruses

bull Pitfalls of hashing for privacy

bull Linkable (zero-knowledge) proofs for private and accountable gossip

41

How to track a given URL A real-world example (II)

bull Target URL has Type I ambiguity with cfpphplinkphpfaqsphp

Decomposition Prefix

petsymposiumorg2016faqsphp 0xaec10b3apetsymposiumorg2016 0x1d13ba6apetsymposiumorg 0x33a02ef5

bull Including 2 prefixes in the local cache rArr Anonymity set size of 4

bull To remove any ambiguitybull Need to include 3 additional prefixes for cfpphp linkphp faqsphpbull A total of 5 prefixes

bull Server receives 2 prefixes rArr visited page is the target URL

bull Server receives 3 prefixes rArr visited page is either of the leaf URLs

bull The third prefix decides which leaf URL was visited

bull Generalizable to any number of prefixes 34

Examples of URLs creating multiple hits

bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing

URL matching decomposition

Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf

17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp

httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues

1001cartesorgtag

Yandex

httpfrxhamstercomuservideofrxhamstercomxhamstercom

httpnlxhamstercomuservideonlxhamstercomxhamstercom

httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom

httpmmofoscomuserloginmmofoscommofoscom

httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom

teenslovehugecockscom

bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom

bull No need to add additional prefix for French or Dutch version35

Responsible disclosure and impact

bull Disclosure to Mozilla Firefox

ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox

bull Disclosure to Yandex

ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team

bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)

ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy

36

Conclusions amp Future Work

Conclusions

Lesson learnt Collisions are hard to tame in security and privacy

Bloom filters

bull Developers tend to ignore the worst-case of algorithms

bull Data structures with ad-hoc crypto primitives are at the best risky

bull Need of secure instantiations eg as in Count-Min sketches

Safe Browsing

bull Re-establish the weakness of anonymity-set privacy model

bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so

38

Future work

Bloom filters

bull On Bloom filters Bloom paradox [Rottenstreich 2015]

bull Beyond Bloom filters Security of Bloom filter variants

Safe Browsing

bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]

bull Privacy Can local cache improve Private Information Retrieval

39

Thank you

40

Other works not covered today

bull Performance of cryptographic accumulators

bull Private password auditing

bull (In)Security of Google and Yandex Safe Browsing

bull Alerting websites Risks and solutions

bull Decompression quines and anti-viruses

bull Pitfalls of hashing for privacy

bull Linkable (zero-knowledge) proofs for private and accountable gossip

41

Examples of URLs creating multiple hits

bull Over 1300 such URLs distributed over 30 domainsbull More frequent in Yandex than in Google Safe Browsing

URL matching decomposition

Googlehttpwps3b17buddiesnetwpcs_sub_7-2pwf

17buddiesnetwpcs_sub_7-2pwf17buddiesnetwp

httpwww1001cartesorgtagemergency-issues1001cartesorgtagemergency-issues

1001cartesorgtag

Yandex

httpfrxhamstercomuservideofrxhamstercomxhamstercom

httpnlxhamstercomuservideonlxhamstercomxhamstercom

httpmwickedpicturescomuserloginmwickedpicturescomwickedpicturescom

httpmmofoscomuserloginmmofoscommofoscom

httpmobileteenslovehugecockscomuserjoinmobileteenslovehugecockscom

teenslovehugecockscom

bull Including a single prefix for xhmastercom blacklists bothfrxhmastercom and nlxhamstercom

bull No need to add additional prefix for French or Dutch version35

Responsible disclosure and impact

bull Disclosure to Mozilla Firefox

ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox

bull Disclosure to Yandex

ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team

bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)

ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy

36

Conclusions amp Future Work

Conclusions

Lesson learnt Collisions are hard to tame in security and privacy

Bloom filters

bull Developers tend to ignore the worst-case of algorithms

bull Data structures with ad-hoc crypto primitives are at the best risky

bull Need of secure instantiations eg as in Count-Min sketches

Safe Browsing

bull Re-establish the weakness of anonymity-set privacy model

bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so

38

Future work

Bloom filters

bull On Bloom filters Bloom paradox [Rottenstreich 2015]

bull Beyond Bloom filters Security of Bloom filter variants

Safe Browsing

bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]

bull Privacy Can local cache improve Private Information Retrieval

39

Thank you

40

Other works not covered today

bull Performance of cryptographic accumulators

bull Private password auditing

bull (In)Security of Google and Yandex Safe Browsing

bull Alerting websites Risks and solutions

bull Decompression quines and anti-viruses

bull Pitfalls of hashing for privacy

bull Linkable (zero-knowledge) proofs for private and accountable gossip

41

Responsible disclosure and impact

bull Disclosure to Mozilla Firefox

ldquoWe have long assumed (without the math to back it up) thatif Google were evil it could seed the list with prefixes thatallowed it to detect whether a few users visited a few selecttargetsrdquo mdash Mozilla Firefox

bull Disclosure to Yandex

ldquoWe canrsquot promise but we plan to study them and provide youwith our feedbackrdquo mdash Yandex Safe Browsing team

bull Non-disclosure agreement with Googlebull Launch of Google Safe Browsing API v4 (In June 2016)

ldquoGoogle does learn the hash prefixes of URLs but the hashprefixes donrsquot provide much information about the actualURLsrdquo mdash Google Safe Browsing v4 privacy policy

36

Conclusions amp Future Work

Conclusions

Lesson learnt Collisions are hard to tame in security and privacy

Bloom filters

bull Developers tend to ignore the worst-case of algorithms

bull Data structures with ad-hoc crypto primitives are at the best risky

bull Need of secure instantiations eg as in Count-Min sketches

Safe Browsing

bull Re-establish the weakness of anonymity-set privacy model

bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so

38

Future work

Bloom filters

bull On Bloom filters Bloom paradox [Rottenstreich 2015]

bull Beyond Bloom filters Security of Bloom filter variants

Safe Browsing

bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]

bull Privacy Can local cache improve Private Information Retrieval

39

Thank you

40

Other works not covered today

bull Performance of cryptographic accumulators

bull Private password auditing

bull (In)Security of Google and Yandex Safe Browsing

bull Alerting websites Risks and solutions

bull Decompression quines and anti-viruses

bull Pitfalls of hashing for privacy

bull Linkable (zero-knowledge) proofs for private and accountable gossip

41

Conclusions amp Future Work

Conclusions

Lesson learnt Collisions are hard to tame in security and privacy

Bloom filters

bull Developers tend to ignore the worst-case of algorithms

bull Data structures with ad-hoc crypto primitives are at the best risky

bull Need of secure instantiations eg as in Count-Min sketches

Safe Browsing

bull Re-establish the weakness of anonymity-set privacy model

bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so

38

Future work

Bloom filters

bull On Bloom filters Bloom paradox [Rottenstreich 2015]

bull Beyond Bloom filters Security of Bloom filter variants

Safe Browsing

bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]

bull Privacy Can local cache improve Private Information Retrieval

39

Thank you

40

Other works not covered today

bull Performance of cryptographic accumulators

bull Private password auditing

bull (In)Security of Google and Yandex Safe Browsing

bull Alerting websites Risks and solutions

bull Decompression quines and anti-viruses

bull Pitfalls of hashing for privacy

bull Linkable (zero-knowledge) proofs for private and accountable gossip

41

Conclusions

Lesson learnt Collisions are hard to tame in security and privacy

Bloom filters

bull Developers tend to ignore the worst-case of algorithms

bull Data structures with ad-hoc crypto primitives are at the best risky

bull Need of secure instantiations eg as in Count-Min sketches

Safe Browsing

bull Re-establish the weakness of anonymity-set privacy model

bull Google and Yandex both employ the same privacy modelbull Google is privacy awarebull Yandex less so

38

Future work

Bloom filters

bull On Bloom filters Bloom paradox [Rottenstreich 2015]

bull Beyond Bloom filters Security of Bloom filter variants

Safe Browsing

bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]

bull Privacy Can local cache improve Private Information Retrieval

39

Thank you

40

Other works not covered today

bull Performance of cryptographic accumulators

bull Private password auditing

bull (In)Security of Google and Yandex Safe Browsing

bull Alerting websites Risks and solutions

bull Decompression quines and anti-viruses

bull Pitfalls of hashing for privacy

bull Linkable (zero-knowledge) proofs for private and accountable gossip

41

Future work

Bloom filters

bull On Bloom filters Bloom paradox [Rottenstreich 2015]

bull Beyond Bloom filters Security of Bloom filter variants

Safe Browsing

bull Accountability Need of a decentralized blacklist managementsystem [Freudiger et al 2015]

bull Privacy Can local cache improve Private Information Retrieval

39

Thank you

40

Other works not covered today

bull Performance of cryptographic accumulators

bull Private password auditing

bull (In)Security of Google and Yandex Safe Browsing

bull Alerting websites Risks and solutions

bull Decompression quines and anti-viruses

bull Pitfalls of hashing for privacy

bull Linkable (zero-knowledge) proofs for private and accountable gossip

41

Thank you

40

Other works not covered today

bull Performance of cryptographic accumulators

bull Private password auditing

bull (In)Security of Google and Yandex Safe Browsing

bull Alerting websites Risks and solutions

bull Decompression quines and anti-viruses

bull Pitfalls of hashing for privacy

bull Linkable (zero-knowledge) proofs for private and accountable gossip

41

Other works not covered today

bull Performance of cryptographic accumulators

bull Private password auditing

bull (In)Security of Google and Yandex Safe Browsing

bull Alerting websites Risks and solutions

bull Decompression quines and anti-viruses

bull Pitfalls of hashing for privacy

bull Linkable (zero-knowledge) proofs for private and accountable gossip

41