39
Fast and Scalable Pattern Matching for Content Filtering Sarang Dharmapurikar John Lockwood

Fast and Scalable Pattern Matching for Content Filtering Sarang Dharmapurikar John Lockwood

Embed Size (px)

Citation preview

Page 1: Fast and Scalable Pattern Matching for Content Filtering Sarang Dharmapurikar John Lockwood

Fast and Scalable Pattern Matching for Content Filtering

Sarang DharmapurikarJohn Lockwood

Page 2: Fast and Scalable Pattern Matching for Content Filtering Sarang Dharmapurikar John Lockwood

Sarang Dharmapurikar

Motivation

● Deep packet inspection Detection of Internet worms, computer viruses,

SPAM, copyrighted material, Intrusion Detection/Prevention Layer-7 switching Content classification

● Needs fast string matching mechanism

● Some desirable features of the mechanism String matching at line speed Ability to detect strings at random locations in the payload Ability to detect 1000s of strings Ability to handle arbitrarily long strings

Page 3: Fast and Scalable Pattern Matching for Content Filtering Sarang Dharmapurikar John Lockwood

Sarang Dharmapurikar

Aho-Corasick Algorithm

● Two Problems At least 1 memory access per

character (at the most 2)o Slows it down

Only one character at a timeo bottleneck

s3 : tel

s5 : phones6 : elephant

s4 : telephone

s1 : technicals2 : technically

l

e

p

h

a

n

q24

q25

q26

q27

q28

q29

q30

tq31

e

l

e

p

h

o

n

e

q12

q13

q14

q15

q16

q17

q18

q0

q1

t

e

c

h

n

i

q2

q3

q4

q5

q6

c

a

l

q7

q8

q9

q11y

q10

l

p

h

o

n

e

q19

q20

q21

q22

q23

Page 4: Fast and Scalable Pattern Matching for Content Filtering Sarang Dharmapurikar John Lockwood

Sarang Dharmapurikar

Why not use multiple engines?

Engine1

Engine2

Engine3

Engine4

Incoming connections

Each engine needs plenty of memory….

On-chip memory not practical

We need a memory chip

Multiple memory chipsMore pins, more power, more cost

Page 5: Fast and Scalable Pattern Matching for Content Filtering Sarang Dharmapurikar John Lockwood

Sarang Dharmapurikar

Can we…

● Process Multiple characters at a time● Without using multiple memory chips

?● What if we have a small amount of on-chip

memory?

Page 6: Fast and Scalable Pattern Matching for Content Filtering Sarang Dharmapurikar John Lockwood

Sarang Dharmapurikar

Our Approach

● Modify Aho-Corasick to jump ahead by k characters Jump Ahead Aho-CorasicK (JACK)-FA

● Represent JACK-FA as a hash table. Keep only one copy in the off-chip memory

● Keep k copies of the compressed & approximate JACK-FA hash table in on-chip memory Use Bloom filters for approximate

representation Consumes very little memory

Off-chipJACK-FA

Data stream

On-chip approximate JACK-FAs

Page 7: Fast and Scalable Pattern Matching for Content Filtering Sarang Dharmapurikar John Lockwood

Sarang Dharmapurikar

JACK-FA

s3 : tel

s5 : phon e

s6 : elep hant

s4 : tele phon e

s1 : tech nica l

s2 : tech nica lly

s3 : tel

s5 : phone

s6 : elephant

s4 : telephone

s1 : technical

s2 : technically

q0

q1

q5

tech

nica

s3,q2

q6

tele

phon

q3

phon

hant

q4

S6 q7

elep

s3

tel

S4,s5

e

s5

e

s1

l lly

S1,s2

Page 8: Fast and Scalable Pattern Matching for Content Filtering Sarang Dharmapurikar John Lockwood

Sarang Dharmapurikar

String matching with JACK-FA

t e c h nx y z i c a l l y a b c

hant

q0

q3 q4q1 q2

q5 q6 S6 q7

tech

nica

tele

phon

phon

elep

s3

s1 S4,s5

s5

tel

l lly e

e

S1,s2

w

Page 9: Fast and Scalable Pattern Matching for Content Filtering Sarang Dharmapurikar John Lockwood

Sarang Dharmapurikar

String matching with JACK-FA

t e c h nx y z i c a l l y a b c

hant

q0

q3 q4q1 q2

q5 q6 S6 q7

tech

nica

tele

phon

phon

elep

s3

s1 S4,s5

s5

tel

l lly e

e

S1,s2

w

Page 10: Fast and Scalable Pattern Matching for Content Filtering Sarang Dharmapurikar John Lockwood

Sarang Dharmapurikar

String matching with JACK-FA

t e c h nx y z i c a l l y a b c

hant

q0

q3 q4q1 q2

q5 q6 S6 q7

tech

nica

tele

phon

phon

elep

s3

s1 S4,s5

s5

tel

l lly e

e

S1,s2

w

Page 11: Fast and Scalable Pattern Matching for Content Filtering Sarang Dharmapurikar John Lockwood

Sarang Dharmapurikar

String matching with JACK-FA

t e c h nx y z i c a l l y a b c

hant

q0

q3 q4q1 q2

q5 q6 S6 q7

tech

nica

tele

phon

phon

elep

s3

s1 S4,s5

s5

tel

llly e

e

S1,s2

w

Page 12: Fast and Scalable Pattern Matching for Content Filtering Sarang Dharmapurikar John Lockwood

Sarang Dharmapurikar

String matching with JACK-FA

t e c h nx y z i c a l l y a b c

hant

q0

q3 q4q1 q2

q5 q6 S6 q7

tech

nica

tele

phon

phon

elep

s3

s1 S4,s5

s5

tel

llly e

e

S1,s2

w

Page 13: Fast and Scalable Pattern Matching for Content Filtering Sarang Dharmapurikar John Lockwood

Sarang Dharmapurikar

String matching with JACK-FA

t e c h nx y z i c a l l y a b c

hant

q0

q3 q4q1 q2

q5 q6 S6 q7

tech

nica

tele

phon

phon

elep

s3

s1 S4,s5

s5

tel

llly e

e

S1,s2

w

Page 14: Fast and Scalable Pattern Matching for Content Filtering Sarang Dharmapurikar John Lockwood

Sarang Dharmapurikar

Why we need k JACK-FA

t e c h nx y z i c a l l y a b c

hant

q0

q3 q4q1 q2

q5 q6 S6 q7

tech

nica

tele

phon

phon

elep

s3

s1 S4,s5

s5

tel

llly e

e

S1,s2

Page 15: Fast and Scalable Pattern Matching for Content Filtering Sarang Dharmapurikar John Lockwood

Sarang Dharmapurikar

Speed up

t e c h nx y z i c a l l y a b

Page 16: Fast and Scalable Pattern Matching for Content Filtering Sarang Dharmapurikar John Lockwood

Sarang Dharmapurikar

Speed up

t e c h nx y z i c a l l y a b

A single machine inoff-chip memory

k approximte and compressed machinesin on-chip memory

Use Bloom filters

Page 17: Fast and Scalable Pattern Matching for Content Filtering Sarang Dharmapurikar John Lockwood

Sarang Dharmapurikar

Tabular Representation

hant

q0

q3 q4q1 q2

q5 q6 S6 q7

tech

nica

tele

phon

phon

elep

s3

s1 S4,s5

s5

tel

l lly e

e

S1,s2[state, substr] Next State Matching str Failure Chain

[q0, tech] q1 - q0

[q0, tele] q2 S3 q0

[q0, phon] q3 - q0

[q0, elep] q4 - q0

[q1, nica] q5 - q0

[q2, phon] q6 - q3,q0

[q4, hant] q7 S6 q0

[q0, tel] - S3 -

[q3, e] - S5 -

[q5, lly] - S1, S2 -

[q5, l] - S1 -[q6, e] - S4 , S5 -

Page 18: Fast and Scalable Pattern Matching for Content Filtering Sarang Dharmapurikar John Lockwood

Sarang Dharmapurikar

Implementation with Bloom Filters

[state, substr] Next State Matching str Failure Chain[q0, tech] q1 - q0

[q0, tele] q2S3 q0

[q0, phon] q3 - q0

[q0, elep] q4 - q0

[q1, nica] q5 - q0

[q2, phon] q3 - q3,q0

[q4, hant] q7S6 q0

[q0, tel] - S3 -

[q3, e] - S5 -[q5, lly] - S1, S2 -

[q5, l] - S1 -[q6, e] - S4 , S5 -

B4B3B1 B2

q

Page 19: Fast and Scalable Pattern Matching for Content Filtering Sarang Dharmapurikar John Lockwood

Sarang Dharmapurikar

Implementation with Bloom Filters

[state, substr] Next State Matching str Failure Chain[q0, tech] q1 - q0[q0, tele] q2

S3 q0[q0, phon] q3 - q0[q0, elep] q4 - q0[q1, nica] q5 - q0[q2, phon] q3 - q3,q0[q4, hant] q7

S6 q0[q0, tel] - S3 -

[q3, e] - S5 -[q5, lly] - S1, S2 -

[q5, l] - S1 -[q6, e] - S4 , S5 -

B4B3B1 B2

q1

B4B3B1 B2

q2

B4B3B1 B2

q3

B4B3B1 B2

q4

Page 20: Fast and Scalable Pattern Matching for Content Filtering Sarang Dharmapurikar John Lockwood

Sarang Dharmapurikar

Throughput with Snort strings

● Off-chip memory: 250 MHz QDR-SRAM, 64-bit wide● String concentration: 1 in 100 characters● 2250 strings● 2 to 122 character strings

Page 21: Fast and Scalable Pattern Matching for Content Filtering Sarang Dharmapurikar John Lockwood

Sarang Dharmapurikar

Conclusions

● Fast string matching is an important module for Content filtering applications

● Off-chip memory accesses slow down string matching

● A large fraction of memory accesses can be avoided Using a small on-chip memory and Bloom filters

● Our accelerated Aho-Corasick algorithm can process 2250 strings with less than 50KB on-chip memory At a speed of more than 10Gbps

Page 22: Fast and Scalable Pattern Matching for Content Filtering Sarang Dharmapurikar John Lockwood

Thanks!

Questions ?

Page 23: Fast and Scalable Pattern Matching for Content Filtering Sarang Dharmapurikar John Lockwood

Sarang Dharmapurikar

Motivation

● The multi-pattern matching algorithm works for short strings (16 bytes) Hash computation over long strings becomes problematic Some virus signatures can be several hundred bytes long Snort’s longest string is 122 bytes

0

20

40

60

80

100

120

140

160

180

0 20 40 60 80 100 120 140

# s

trin

gs

string length in bytes

Page 24: Fast and Scalable Pattern Matching for Content Filtering Sarang Dharmapurikar John Lockwood

Sarang Dharmapurikar

Page 25: Fast and Scalable Pattern Matching for Content Filtering Sarang Dharmapurikar John Lockwood

Sarang Dharmapurikar

Accelerated Aho-Corasick Algorithm

● How to support arbitrarily large strings? At the cost of more memory? Break a long string into multiple smaller pieces Stitch them in a state machine Match individual segment and track the state machine

q0 q1 q2 q3

tech nica lly

SymbolsTail

Page 26: Fast and Scalable Pattern Matching for Content Filtering Sarang Dharmapurikar John Lockwood

Sarang Dharmapurikar

Speed up

t e c h nx y z i c a l l y a b

s1 s2 s3 s4

Page 27: Fast and Scalable Pattern Matching for Content Filtering Sarang Dharmapurikar John Lockwood

Sarang Dharmapurikar

Multiple machines

t e c h nx y z i c a l l y a b

s1 s2 s3 s4

Page 28: Fast and Scalable Pattern Matching for Content Filtering Sarang Dharmapurikar John Lockwood

Sarang Dharmapurikar

Multiple machines

t e c h nx y z i c a l l y a b

s1 s2 s3 s4

Page 29: Fast and Scalable Pattern Matching for Content Filtering Sarang Dharmapurikar John Lockwood

Sarang Dharmapurikar

Multiple machines

t e c h nx y z i c a l l y a b

s1 s2 s3 s4

Page 30: Fast and Scalable Pattern Matching for Content Filtering Sarang Dharmapurikar John Lockwood

Sarang Dharmapurikar

Multiple machines

t e c h nx y z i c a l l y a b

s1 s2 s3 s4

Page 31: Fast and Scalable Pattern Matching for Content Filtering Sarang Dharmapurikar John Lockwood

Sarang Dharmapurikar

Multiple machines

t e c h nx y z i c a l l y a b

s1 s2 s3 s4

Page 32: Fast and Scalable Pattern Matching for Content Filtering Sarang Dharmapurikar John Lockwood

Sarang Dharmapurikar

Aho-Corasick Algorithm

● Two Problems At least 1 memory access per

character (at the most 2)o Slows it down

Only one character at a timeo bottleneck

s3 : tel

s5 : phones6 : elephant

s4 : telephone

s1 : technicals2 : technically

q0

l

e

p

h

a

n

q24

q25

q26

q27

q28

q29

q30

tq31

q1

pe

t

e

lc

h

n

i

e

p

h

o

n

e

q2

q3

q4

q5

q6

q12

q13

q14

q15

q16

q17

q18

c

a

l

q7

q8

q9

q11y

q10

l

h

o

n

e

q19

q20

q21

q22

q23

Page 33: Fast and Scalable Pattern Matching for Content Filtering Sarang Dharmapurikar John Lockwood

Sarang Dharmapurikar

Bloom Filter

X

1

1

1

1

1

m-bit Array

H1

H2

H3

H4

Hk

Bloom Filter

Page 34: Fast and Scalable Pattern Matching for Content Filtering Sarang Dharmapurikar John Lockwood

Sarang Dharmapurikar

Bloom Filter

Y

1

1

1

1

1

m-bit Array

1

1

1

H1

H2

H3

H4

Hk

Page 35: Fast and Scalable Pattern Matching for Content Filtering Sarang Dharmapurikar John Lockwood

Sarang Dharmapurikar

Bloom Filter

X

1

1

1

1

1

m-bit Array

1

1

1

match

H1

H2

H3

H4

Hk

Page 36: Fast and Scalable Pattern Matching for Content Filtering Sarang Dharmapurikar John Lockwood

Sarang Dharmapurikar

Bloom Filter

W

1

1

1

1

1

m-bit Array

1

1

1

Match

(false positive)

H1

H2

H3

H4

Hk

Page 37: Fast and Scalable Pattern Matching for Content Filtering Sarang Dharmapurikar John Lockwood

Sarang Dharmapurikar

Speed up

t e c h nx y z i c a l l y a b

Page 38: Fast and Scalable Pattern Matching for Content Filtering Sarang Dharmapurikar John Lockwood

Sarang Dharmapurikar

Speed up

t e c h nx y z i c a l l y a b

Page 39: Fast and Scalable Pattern Matching for Content Filtering Sarang Dharmapurikar John Lockwood

Sarang Dharmapurikar

Bloom filter

BloomFilter

Is x present in the filter?

{No, Yes}

Can be a false positive

But false positive probability is very small…like 0.001

Represents a set of strings

Each string consumes very few bits…like 12 to 16 bits