44
Fighting Spam May Be Easier Than You Think Cynthia Dwork Microsoft Research SVC

Fighting Spam May Be Easier Than You Think Cynthia Dwork Microsoft Research SVC

Embed Size (px)

Citation preview

Page 1: Fighting Spam May Be Easier Than You Think Cynthia Dwork Microsoft Research SVC

Fighting Spam May Be Easier Than You Think

Cynthia DworkMicrosoft Research SVC

Page 2: Fighting Spam May Be Easier Than You Think Cynthia Dwork Microsoft Research SVC

2

Why?

Huge problemIndustry: costs in worker attention, infrastructure

Individuals: increased ISP fees

Hotmail: huge storage costs, (was?) 65-85%

FTC: fraud, confidence crimes

Ruining e-mail, devaluing the Internet

Page 3: Fighting Spam May Be Easier Than You Think Cynthia Dwork Microsoft Research SVC

3

Principal Techniques Filtering

Everyone: text-based Brightmail: decoys; rules updates Microsoft Research: (seeded) trainable filters SpamCloud: collaborative filtering SpamCop, Osirusoft, etc: IP addresses, proxies, …

Make Sender Pay Computation [Dwork-Naor’92; Back’97] Human Attention [Naor’96, SRC patent] Money [Gates’96] and Bonds [Vanquish]

Page 4: Fighting Spam May Be Easier Than You Think Cynthia Dwork Microsoft Research SVC

4

Outline The Computational Approach (“puzzles”)

Overview; economic considerations Burrows’ suggestion: memory-bound functions Social and Technical issues

Turing Tests Architectures

Point-to-Point & AmanuensisCentralized Ticket Server

Deeper Discussion of Memory-Bound Functions3 Approaches and Small-Space CryptanalysesNew Proposal (joint with A. Goldberg and M. Naor)

Page 5: Fighting Spam May Be Easier Than You Think Cynthia Dwork Microsoft Research SVC

5

Computational Approach

If I don’t know you:Prove you spent ten seconds CPU time,

just for me, and just for this message

User Experience:Automatically and in the backgroundChecking proof extremely easy

All unsolicited mail treated equally

Page 6: Fighting Spam May Be Easier Than You Think Cynthia Dwork Microsoft Research SVC

6

Economics (80,000 s/day) / (10s/message) = 8,000 msgs/day

Hotmail’s billion daily spams: 125,000 CPUs Up front capital cost just for HM: circa $150,000,000

The spammers can’t afford it.

Page 7: Fighting Spam May Be Easier Than You Think Cynthia Dwork Microsoft Research SVC

7

Who Are the Spammers?

"Most of the spammers are not wealthy people," said Stephen Kline, a lawyer for the New York State attorney general's office.

Page 8: Fighting Spam May Be Easier Than You Think Cynthia Dwork Microsoft Research SVC

8

Page 9: Fighting Spam May Be Easier Than You Think Cynthia Dwork Microsoft Research SVC

9

Who Are the Spammers? Spamhaus ROKSO 100/90% Circa 300 people total; very top few spammers

make a few million/year (F. Krueger, SMN; also, see the recent articles about Alan Ralsky)

Comparison: FastClick, with 30% of popunder market, has profit of $2 mil/yr (income of $4 mil/yr)

Biggest publicly traded company: eUniverse ($6 mil profit); list of 108 “good” names

Hit ratio: 10-6 to 10-5

Page 10: Fighting Spam May Be Easier Than You Think Cynthia Dwork Microsoft Research SVC

10

Cryptographic Puzzles

Hard to compute; f(S,R,t,nonce) can’t be amortizedlots of work for the sender

Easy to check “z = f(S,R,t,nonce)”little work for receiver

Parameterized to scale with Moore's Laweasy to exponentially increase computational cost, while

barely increasing checking cost

Can be based on (carefully) weakened signature schemes, hash collisions

Can arrange a “shortcut” for post office

Page 11: Fighting Spam May Be Easier Than You Think Cynthia Dwork Microsoft Research SVC

11

Burrows’ Suggestion CPU speeds vary widely across machines, but

memory latencies vary much less (10+ vs 4)

So: design a puzzle leading to a large number of cache misses

Page 12: Fighting Spam May Be Easier Than You Think Cynthia Dwork Microsoft Research SVC

12

Social Issues

Who chooses f?One global f? Who sets the price?

Autonomously chosen f’s?

How is f distributed (ultimately)?Global f built into all mail clients? (1-pass)

Directory? Query-Response? (3-pass)

Page 13: Fighting Spam May Be Easier Than You Think Cynthia Dwork Microsoft Research SVC

13

Technical Issues Distribution Lists (!)

Awkward Introductory PeriodOld versions of mail programs; bounces

Very Slow/Small-Memory MachinesCan implement “post office” (CPU), but:

Who gets to be the Post Office? Trust?

Cache Thrashing (memory-bound)

The Subverters

Page 14: Fighting Spam May Be Easier Than You Think Cynthia Dwork Microsoft Research SVC

14

Turing Tests: Payment in Kind [N’96;SRC] CAPTCHAs (Completely Automated Public T. test

for telling Computers and Humans Apart)Defeat automated account generation

5-10% drop in subscription rate

Teams of conjectured-humans (8-hour shifts)

Yes: Distorted images of simple word

No: ``Find 3 words in image with multiple overlapping images of words''

Others: subject classification, audio

M. Blum: people have done preprocessing

Page 15: Fighting Spam May Be Easier Than You Think Cynthia Dwork Microsoft Research SVC

15

Social and Technical Issues Social (especially in enterprise setting)

ADA, S.508 (blind, dyslexic) Not ``professional'' Productivity cost: context switch (may be mitigated in

architecture supporting pre-computation) Wrong direction: we should offload crap onto machines

Technical No theory; if broken, can’t scale. Idrive, AltaVista, broken [J] EZ Gimpy (Yahoo!) broken [M+]

Page 16: Fighting Spam May Be Easier Than You Think Cynthia Dwork Microsoft Research SVC

16

Turing Test Economic Issues Human time in poor country has roughly

same cost as computer time (even if 8/5 wk) Also need computer, but it can be slow, cheap10s same ballpark as some Turing tests

Suppose we’re wrong about cost (a few hundredths of a cent), and really the price should be 1 cent1-cent challenge costs 2 minutes. No one not

being paid would dream of submitting to this.

Page 17: Fighting Spam May Be Easier Than You Think Cynthia Dwork Microsoft Research SVC

17

Cycle Stealing Stealing cycles from humans: Pornography companies

require random users to solve a CAPTCHA before seeing the next image [vABL]

Worse for computational challenges

Economics: lots of currency means currency is devalued. Prices go up.

There are lots of cycles, but anyone can buy them. Can go into business brokering cycles.

Page 18: Fighting Spam May Be Easier Than You Think Cynthia Dwork Microsoft Research SVC

18

Point-to-Point Architecture

(Ideal Message Flow)Single-pass “send-and-forget”Can augment with amanuensis to handle slow machinesCan add post office / pricing authority to handle money paymentsTime mostly used as nonce for avoiding replays (cache tags, discard

duplicates; time controls size of cache)

Sender client

S

Sender client

S

Recipient client

R

Recipient client

R

m, f(S,R,t,nonce)

Page 19: Fighting Spam May Be Easier Than You Think Cynthia Dwork Microsoft Research SVC

19

Here to There (and There’)

Three e-mail messages R’s mail client caches m, S, R, t, nonce Bounce (viral marketing opportunity)

html attachment with Java Script for f contains parameters for f (S, R, t,nonce)clicking on link causes computation, sending e-mail(optional) link for download of client software

Ignorant Sender

S

Ignorant Sender

S

Spam-Protected

Recipient

R

Spam-Protected

Recipient

R

m

bouncebounce

release mrelease m

Page 20: Fighting Spam May Be Easier Than You Think Cynthia Dwork Microsoft Research SVC

20

Point-to-Point: Issues Social

Senders trust code (transition period)?

(Who gets to be a pricing authority?)

TechnicalNo pre-computation possible (slow machines)

Function update requires new download

Sender’s browser must be configured for sending e-mail (transition period)

Page 21: Fighting Spam May Be Easier Than You Think Cynthia Dwork Microsoft Research SVC

21

Remark: Web-Based Mail Computation done by client (applet or

JavaScript)

Verification and safelist checking done by server

Page 22: Fighting Spam May Be Easier Than You Think Cynthia Dwork Microsoft Research SVC

22

(Ideal Message Flow)

Ticket kit = (#, puzzle)

Ticket = (#, response) Any payment method Tickets may be accumulated in

advance (pre-computation). Useful if price is high.

Refunds (not shown) Centralization eases updates

Ticket Server [ABBW]

Sender

RecipientServer

Ticket Server

GetTicket

Kit

HTTP

MSG + TicketSMTP

HTTP TicketOK?

1,2

4,5

3

Page 23: Fighting Spam May Be Easier Than You Think Cynthia Dwork Microsoft Research SVC

23

Social and Technical Issues Social

Who gets to be a ticket server? Trust? Privacy?

Federation?

Trust code (transition period)

TechnicalComplex; 5 flows (7 with refund)

T may be on different continent than S, R

Target of choice for subverters

Page 24: Fighting Spam May Be Easier Than You Think Cynthia Dwork Microsoft Research SVC

24

Memory-Bound Functions [ABMW’02]

Devilish model! assume cache size (of spammer’s big

machine) is half the size of memory (of weakest sending machine)

must avoid exploitation of locality, cache lines (factors of 64 or even 16 matter)

watch out for low-space crypto attacks

Page 25: Fighting Spam May Be Easier Than You Think Cynthia Dwork Microsoft Research SVC

25

Meet in Middle (eg, using DES)Input: x, y partially specified K1, K2

Output: K1, K2

Si = {keys consistent with Ki }

Create table T with all EK1(x), (so |T| = |S1|)

8 K2 2 S2:

compute z=(EK2)-1 (y)

check if z 2 T

Optimism: need lots of space to store T; won’t fit in cache; hard to generate the z’s in useful order

DES

DES

x

y

K1

K2

Page 26: Fighting Spam May Be Easier Than You Think Cynthia Dwork Microsoft Research SVC

26

Analysis of Meet-in-the-Middle Tradeoff: cache size c < |S1| yields time

proportional to |S1|¢|S2|/c with no cache misses

Sequentiality attacks [AB]: Once T is prepared, cache-sized blocks can be loaded sequentially.

Luck: challenge may be resolved in cache. Example: if c = |S1|/4 then there is a ¼ chance of resolution with no further cache misses once the cache has been loaded with the first block of T.

Page 27: Fighting Spam May Be Easier Than You Think Cynthia Dwork Microsoft Research SVC

27

Subset Sum Input: Random 2k-bit numbers a1, … , a2k, T

Output: subset of ai’s summing to T mod 22k

Optimism:LLL lattice basis reduction dynamic programming: simultaneously O(2k)

time and space

Analysis: killed by Schroeppel and Shamir,

Simultaneous T=O(2k), S=O(2k/2)

Page 28: Fighting Spam May Be Easier Than You Think Cynthia Dwork Microsoft Research SVC

28

Easy Functions [ABMW’02]

(simplified construction) f: n bits to n bits, easy Given xk 2 range(f(k)), find a

pre-image with certain properties

Hope: best solved by building table for f-1 and working back from xk

Choose n=22 so f -1 fits in small memory, but not in cache

Optimism: xk is root of tree of expected size k2

Page 29: Fighting Spam May Be Easier Than You Think Cynthia Dwork Microsoft Research SVC

29

Analysis of Easy Functions Vulnerable to T/S tradeoffs for inverting

functions [Hellman; Fiat-Naor]TS2 = O(N2)cpuf

Cost of inverting f without memory accesses is only O(4) times cost of computing f (S=N/2); no cache misses!

As clock speeds increase, must increase cpuf

Page 30: Fighting Spam May Be Easier Than You Think Cynthia Dwork Microsoft Research SVC

30

Interpretation: Easy-Function challenges are cpu-bound, but

their structure (the cpu-avoiding, table-lookup method) dampens the effect of Moore’s Law.

Verifier’s costs increase with Moore’s Lawk ¢ cpuf cycles

Page 31: Fighting Spam May Be Easier Than You Think Cynthia Dwork Microsoft Research SVC

31

Our Approach

The sender makes a sequence of random memory accesses and sends a proof of having done so to the receiver, who needs a small number of accesses for verification. The memory access pattern leads to many cache misses, making the computation memory-bound.

How to generate random-looking access pattern?

What is an easily verifiable proof?

Page 32: Fighting Spam May Be Easier Than You Think Cynthia Dwork Microsoft Research SVC

32

High Level Description Shared array of random numbers T A path is a sequence of inherently sequential

accesses to T Sender must search a set S of paths to find a

desired path Process is message/recipient dependent Sender sends a proof of finding desired path to

receiver, who must verify Ratio of sender/receiver work is S

Page 33: Fighting Spam May Be Easier Than You Think Cynthia Dwork Microsoft Research SVC

33

Incompressibility and RC4 Big (222) Fixed-Forever Truly Random T

Model walk on prg of RC4 stream cipher

Intuition: Use the pr bit stream (seeded by message) to choose

entries of T to access.

These, in turn, affect the stream (defending against pre-processing to order T so as to exploit locality).

Page 34: Fighting Spam May Be Easier Than You Think Cynthia Dwork Microsoft Research SVC

34

RC4 (Partial Description)

Initialize A to pseudo-random permutation of {1, 2, …, 256} using a seed K

Initialize Indices:i = 0; j = 0

PR generation loop:i = i + 1

j = j + A[i]

Swap (A[i], A[j]);

Output z = A[A[i] + A[j]]

Page 35: Fighting Spam May Be Easier Than You Think Cynthia Dwork Microsoft Research SVC

35

DGN: Initializing A and Basic Step Fixed-Forever truly

random A of size 256; 32-bit words

Given m and trial number i, compute strong cryptographic 256-word mask M.

Set A = M © A.

c = A mod 222

Initialize Indices:d=0; e=0

Walk:d = d + 1

e = e + A[d]

A[d] = A[d] © T[c]

Swap (A[d], A[e])

c = T[c] © A[A[e] + A[d]]

Page 36: Fighting Spam May Be Easier Than You Think Cynthia Dwork Microsoft Research SVC

36

Side by SideGenerate pseudo-

random permutation of {1, 2, …, 256} using a seed K

Fixed-Forever truly random A of size 256; 32-bit words

Given m and trial number i, compute strong cryptographic 256-word mask M.

Set A = M © A.

c = A mod 222

Page 37: Fighting Spam May Be Easier Than You Think Cynthia Dwork Microsoft Research SVC

37

Side by Side Initialize Indices:

i = 0; j = 0

PR generation loop:i = i + 1

j = j + A[i]

Swap (A[i], A[j])

Output z = A[A[i]+A[j]]

Initialize Indices: d=0; e=0

Walk:d = d + 1

e = e + A[d]

A[d] = A[d] © T[c]

Swap (A[d], A[e])

c = T[c] © A[A[e] + A[d]]

Page 38: Fighting Spam May Be Easier Than You Think Cynthia Dwork Microsoft Research SVC

38

Full DGN

E: factor by which computation cost exceeds verification

L: length of walk

Trial i succeeds if hash of A after ith walk ends in 1+log2E zeroes (expected number of trials is E).

Verification: trace the walk (L memory accesses)

Page 39: Fighting Spam May Be Easier Than You Think Cynthia Dwork Microsoft Research SVC

39

Parallel Attack [ABMW] Process many paths in parallel; do

memory accesses in sorted order to take advantage of cache lines

Say attack succeeds if the number of occupied cache lines is ¾ the number of parallel paths (ie, reduce #lines by 1/3)

Probabilistic analysis shows attack needs over 217 paths, but only 214 different A’s fit into 16MB cache

Page 40: Fighting Spam May Be Easier Than You Think Cynthia Dwork Microsoft Research SVC

40

Parameters

Want ELt=10s, whereL = path length

t = memory latency = 0.2 microseconds

Reasonable choices: E = 24,000, L = 2048

Page 41: Fighting Spam May Be Easier Than You Think Cynthia Dwork Microsoft Research SVC

41

Intuition for Why it Works

Fight tradeoffs using incompressible T

Fight preprocessing of T by using message+trial dependent A in computing walk

Fight preprocessing of A by modifying using T

Fight locality: choice of E and trial-dependence of A

Good statistics for walks by analogy with statistics for heavily-studied RC4 (?)

Page 42: Fighting Spam May Be Easier Than You Think Cynthia Dwork Microsoft Research SVC

42

Lower Bound for Abstract Version

General formal model

Abstraction of our algorithm using idealized hash functions

Proof that amortized number of cache misses is linear in desired number

Page 43: Fighting Spam May Be Easier Than You Think Cynthia Dwork Microsoft Research SVC

43

Summary

Discussed computational approach, Turing tests, economics, cycle-stealing

Briefly mentioned two architectures

Examined difficulties of constructing memory-bound pricing functions and proposed a new one designed to avoid these difficulties

Lower bounds for idealization.

Page 44: Fighting Spam May Be Easier Than You Think Cynthia Dwork Microsoft Research SVC

44

Future Work and Open Questions

Stanford undergrads just completed simple Outlook proxy and Unix filters; web-based implementation in progress; next steps: handle bounces, add signatures

Distribution Lists

Good MB functions with short descriptions (will subset product work)?