38
Application of Bloom Filters for Longest Prefix Matching and String Matching Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy, David Taylor, Todd Sproull and John Lockwood

Application of Bloom Filters for Longest Prefix Matching and String Matching Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy, David

Embed Size (px)

Citation preview

Page 1: Application of Bloom Filters for Longest Prefix Matching and String Matching Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy, David

Application of Bloom Filters for Longest Prefix Matching and String Matching

Sarang Dharmapurikar

With contributions from : Praveen Krishnamurthy,

David Taylor, Todd Sproull and John Lockwood

Page 2: Application of Bloom Filters for Longest Prefix Matching and String Matching Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy, David

Sarang Dharmapurikar

Agenda

● Background on Bloom filters

● Application to Longest Prefix Matching

● Application to String Matching

● Snort on Chip (if the time permits)

Page 3: Application of Bloom Filters for Longest Prefix Matching and String Matching Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy, David

Sarang Dharmapurikar

Bloom Filter

X

1

1

1

1

1

m-bit Array

H1

H2

H3

H4

Hk

Bloom Filter

Page 4: Application of Bloom Filters for Longest Prefix Matching and String Matching Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy, David

Sarang Dharmapurikar

Bloom Filter

Y

1

1

1

1

1

m-bit Array

1

1

1

H1

H2

H3

H4

Hk

Page 5: Application of Bloom Filters for Longest Prefix Matching and String Matching Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy, David

Sarang Dharmapurikar

Bloom Filter

X

1

1

1

1

1

m-bit Array

1

1

1

match

H1

H2

H3

H4

Hk

Page 6: Application of Bloom Filters for Longest Prefix Matching and String Matching Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy, David

Sarang Dharmapurikar

Bloom Filter

W

1

1

1

1

1

m-bit Array

1

1

1

Match

(false positive)

H1

H2

H3

H4

Hk

Page 7: Application of Bloom Filters for Longest Prefix Matching and String Matching Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy, David

Sarang Dharmapurikar

Optimal Parameters of a Bloom filter

● n : number of messages to be stored● k : number of hash functions● m : the size of the bit-array

(memory)

● The false positive probability

f = (½)k

● The optimal value of hash functions, k, is k = ln2 × m/n = 0.693 × m/n

Y

1

1

1

1

1

m-bit Array

1

1

1

H1

H2

H3

H4

Hk

Key Point : False positive probability decreases exponentially with linear increase in the number of hash functions & memory

Page 8: Application of Bloom Filters for Longest Prefix Matching and String Matching Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy, David

Longest Prefix Matching

“Longest Prefix Matching Using Bloom Filters”Sarang Dharmapurikar, Praveen Krishnamurthy, David E. Taylor

SIGCOMM 2003

Page 9: Application of Bloom Filters for Longest Prefix Matching and String Matching Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy, David

Sarang Dharmapurikar

Motivation● Router speed depends on Longest Prefix Matching (LPM)

IP lookup requires LPMo Find the longest matching prefix of destination IP address in the

routing table and retrieve the next-hop

● Algorithmic approaches

Controlled Prefix Expansion (CPE) and variants

Lulea, Tree Bitmap

Binary search on prefix lengths

Low power and low costo Memory accesses are a bottleneck

● Device based approach TCAM : more power and more cost compared to SRAM

Page 10: Application of Bloom Filters for Longest Prefix Matching and String Matching Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy, David

Sarang Dharmapurikar

Desirable Features for LPM

● High speed

OC-768 => 125 million lookups per second

● Low power

● Low cost

● Feasible to implement

● Fast incremental route updates

● Scalable with IP address length (for IPv6)

Page 11: Application of Bloom Filters for Longest Prefix Matching and String Matching Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy, David

Sarang Dharmapurikar

Longest Prefix Matching

0* 31* 401* 610* 1311* 22010* 7011* 12101* 560010* 41001* 51010* 13

1011011010100101001111111111111* 710110110101010111101001111111111 511011110111101010111111111111111 13

10101110110110110110100010110111

Destination IP address prefix Next hop

Page 12: Application of Bloom Filters for Longest Prefix Matching and String Matching Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy, David

Sarang Dharmapurikar

Longest Prefix Matching

10101110110110110110100010110111

Destination IP address

1 2 3 4 21 2420 25 32

0* 31* 401* 610* 1311* 22010* 7011* 12101* 560010* 41001* 51010* 13

1011011010100101001111111111111* 710110110101010111101001111111111 511011110111101010111111111111111 13

prefix Next hop

Page 13: Application of Bloom Filters for Longest Prefix Matching and String Matching Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy, David

Sarang Dharmapurikar

Longest Prefix Matching

10101110110110110110100010110111

Destination IP address

0* 3

101* 56

0010* 4

10110110101010111101001111111111 5

11011110111101010111111111111111 13

01* 6

1010* 13

011* 12010* 7

1011011010100101001111111111111* 7

1001* 5

11* 22

10* 13

1* 4

Prefix Next hop

Hash Table

21 2420 25 321 2 3 4

Page 14: Application of Bloom Filters for Longest Prefix Matching and String Matching Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy, David

Sarang Dharmapurikar

Longest Prefix Matching

10101110110110110110100010110111

Destination IP address

0* 3

101* 56

0010* 4

10110110101010111101001111111111 5

11011110111101010111111111111111 13

01* 6

1010* 13

011* 12010* 7

1011011010100101001111111111111* 7

1001* 5

11* 22

10* 13

1* 4

Prefix Next hop

Hash Table

Page 15: Application of Bloom Filters for Longest Prefix Matching and String Matching Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy, David

Sarang Dharmapurikar

Longest Prefix Matching

10101110110110110110100010110111

Destination IP address

0* 3

101* 56

0010* 4

10110110101010111101001111111111 5

11011110111101010111111111111111 13

01* 6

1010* 13

011* 12010* 7

1011011010100101001111111111111* 7

1001* 5

11* 22

10* 13

1* 4

Prefix Next hop

Hash Table

Hash

Function

Page 16: Application of Bloom Filters for Longest Prefix Matching and String Matching Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy, David

Sarang Dharmapurikar

System Overview

HashTable

Interface

Bloom filters

Destination IP address

Next Hop

PriorityEncoder

Prefix Next Hop

Hash Table

Page 17: Application of Bloom Filters for Longest Prefix Matching and String Matching Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy, David

Sarang Dharmapurikar

Effect of False Positives

● On an average, one Bloom filter makes f false hash probes per lookup

● B Bloom filters make Bf false hash probes● One additional true hash probe required for

route lookup● Expected hash probes per lookup

Eexp ≤ Bf + 1

● Worst case hash probes per lookup Eworst = B + 1

Page 18: Application of Bloom Filters for Longest Prefix Matching and String Matching Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy, David

Sarang Dharmapurikar

Tuning False Positive

● Uniform false positive probability for all the Bloom filters required

● Same number of hash functions in all Bloom filters

● Hence,

● Thus,

2ln

i

i

n

mk

N

M

n

m

n

m

n

m

n

m

i

i

32

32

2

2

1

1 .....

2lnN

Mk

2ln

2

1 N

M

f

Page 19: Application of Bloom Filters for Longest Prefix Matching and String Matching Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy, David

Sarang Dharmapurikar

Basic Configuration

1

1.5

2

2.5

3

3.5

4

4.5

5

5.5

6

1 1.5 2 2.5 3 3.5 4

Exp

ect

ed

# o

f h

ash

pro

be

s p

er

loo

kup

Size of embedded memory (MBits)

100000 prefixes150000 prefixes200000 prefixes250000 prefixes

12

132

2ln

exp

N

M

E

32worstE

Less than 2 hash probes per lookup

Close to 1 hash probe per lookup

Page 20: Application of Bloom Filters for Longest Prefix Matching and String Matching Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy, David

Sarang Dharmapurikar

Direct Lookup Array

● Expand prefixes of length 1-19 into 20 bit prefixes

● Direct lookup on 20 bit prefixes in a table

Needs 220

entries in the off-chip memory

● Eliminate first 20 Bloom filters

Less prefixes to store in Bloom filters

21 241 2 3 4 20 25 32

12 Bloom filters

101* 56

11011110111101010111111* 13

01* 6

1010* 13

011* 12010* 7

10110110101001010011111* 7

11* 22

3

4

5

13

57

4

Direct Lookup Table

220 entries

Hash

Table

Page 21: Application of Bloom Filters for Longest Prefix Matching and String Matching Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy, David

Sarang Dharmapurikar

Direct Lookup Array

● N20 = original number of prefixes of length up to 20 bits

● Typically N20 = 21 % of N

1

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2

1 1.5 2 2.5 3 3.5 4

Exp

ecte

d #

of h

ash

prob

es p

er lo

okup

Size of embedded memory (MBits)

100000 prefixes150000 prefixes200000 prefixes250000 prefixes

12

112

2ln

exp

20

NN

M

E

13112 worstE

Less than 1.1 hash probes per lookup

Page 22: Application of Bloom Filters for Longest Prefix Matching and String Matching Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy, David

Sarang Dharmapurikar

Controlled Prefix Expansion

● Expand prefixes of length 21-23 into 24 bits

● Expand prefixes of length 25-31 into 32 bits

● Off-chip Hash table contains only 24 bit and 32 bit prefixes

21 24 3225

12 Bloom filters

101010001010010011101* 56

11011110111101010111111* 13

010100100001001001001* 6

101000100110101010010* 13

01100100111001101010010* 12010001010010010100001* 7

10110110101001010011111* 7

111101001101000111110* 22

3

4

5

13

57

4

Direct Lookup Table

220 entries

Hash

Table

Page 23: Application of Bloom Filters for Longest Prefix Matching and String Matching Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy, David

Sarang Dharmapurikar

Controlled Prefix Expansion

● N32 = original 25-32 bit prefixes 0.2% of the total

● α32 = Expansion factor for 25-32 bit prefix expansion Typically α32 = 50

● N24 = original 21-24 bit prefixes 75.2% of the total

● α24 = Expansion factor for 21-24 bit prefix expansion Typically α24 = 1.8

1

1.1

1.2

1.3

1.4

1.5

1.6

1 1.5 2 2.5 3 3.5 4

Exp

ecte

d #

of h

ash

prob

es p

er lo

okup

Size of embedded memory (MBits)

100000 prefixes150000 prefixes200000 prefixes250000 prefixes

12

12

2ln

exp

24243232

NN

M

E

312 worstE

Less than 1.2 hash probes per lookup

Page 24: Application of Bloom Filters for Longest Prefix Matching and String Matching Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy, David

Sarang Dharmapurikar

Simulation results

● Scheme 1: 32 Bloom Filters

● Scheme 2: 12 Bloom filters and Direct Lookup Array

● Scheme 3: 2 Bloom filters and Direct Lookup Array

Theoretical Observed

Eexp1.007670 1.007390

Eworst32 3

Theoretical Observed

Eexp1.000204 1.000898

Eworst13 3

Theoretical Observed

Eexp1.006005 1.003265

Eworst3 3

• 15 IPv4 BGP tables• Avg. of N =115,000 prefixes• 2 Mbits of embedded RAM considered

Page 25: Application of Bloom Filters for Longest Prefix Matching and String Matching Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy, David

Sarang Dharmapurikar

Hardware Implementation Consideration

● Supporting multiple hash functions in memory

The number of hash functions in each Bloom filter can be 10 to 20

Each hash function requires one memory port for a random lookup

How to support so many ports on memory?

● Use multiple memory cores

Restricts the range of hash functions to a memory insignificant effect on false positive probability

Page 26: Application of Bloom Filters for Longest Prefix Matching and String Matching Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy, David

Sarang Dharmapurikar

Future Work: Handling Worst Case

● Worst case can be a problem particularly for IPv6● Hybrid schemes involving TCAM and Bloom filters?

Worst case is due to the number of unique Bloom filter Reduce the number of Bloom filters by:

o Using TCAM for prefix lengths with fewer prefixeso And/Or using CPE

Reduce the false positive probability of the individual Bloom filter to “almost” zero!o For Bloom filter of prefix i, use the # hash functions ki = io Hence # of false matches possible = (1/2)i x 2i =1o # bits per prefix = i/ln2 = 1.44i < 2i = # TCAM bits per prefixo When Bloom filter requires too many hash functions, use less hash

functions but more memory to achieve the desired effect Or, maintain TCAM cache for the prefixes that match multiple

Bloom filters

Page 27: Application of Bloom Filters for Longest Prefix Matching and String Matching Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy, David

String Matching

“Deep Packet Inspection Using Parallel Bloom Filters”Sarang Dharmapurikar, Praveen Krishnamurthy, Todd Sproull, John Lockwood

Hot Interconnects 2003

Page 28: Application of Bloom Filters for Longest Prefix Matching and String Matching Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy, David

Sarang Dharmapurikar

Motivation

● Application of deep packet inspection Detection of Internet worms, computer viruses,

SPAM, copyrighted material Layer-7 switching Content classification

● String detection mechanism is a common infrastructure

● Some desirable features of the mechanism String matching at line speed Ability do detect strings at random locations in the payload Ability to detect 1000s of strings Easy incremental updates to the string database Low power and low cost

Page 29: Application of Bloom Filters for Longest Prefix Matching and String Matching Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy, David

Sarang Dharmapurikar

Using Bloom filters for String Matching

False Positives Resolver

BFW BF5 BF4 BF3

Hash Table

Entering byte bW - - - - - - - - - b5 b4 b3 b2 b1 Leaving byte

BF4

Page 30: Application of Bloom Filters for Longest Prefix Matching and String Matching Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy, David

Sarang Dharmapurikar

Single Bloom Filter Engine

False Positives Resolver

BFW BF5 BF4 BF3

Hash Table

Page 31: Application of Bloom Filters for Longest Prefix Matching and String Matching Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy, David

Sarang Dharmapurikar

Using Multiple Engines

Hash Table

Second Level Arbitration

Page 32: Application of Bloom Filters for Longest Prefix Matching and String Matching Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy, David

Sarang Dharmapurikar

System Throughput● G : Number of engines (4)

● B : Number of Bloom filters in each engine (32)

● f : False-positive probability of each bloom filter

● F : Clock frequency (100 MHz)

Conservative for existing FPGAs, SRAMs & SDRAMs

● : Time required to probe the hash table

20 clock cycles to read a burst of 32 bytes from SDRAM

● p : Frequency at which a true signature appears Typically: 1/1000 - 1/100

sbytes

FppGBf

Gthroughput /

1)1(

Time spent in resolving false

positives with off-chip memory

Time spent in confirmation of a true match

Time spent examining window of packets on-chip

f = (½)(ln2*M/N)

N = 10,000

Page 33: Application of Bloom Filters for Longest Prefix Matching and String Matching Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy, David

Sarang Dharmapurikar

Throughput as function of on-chip memory size

0

0.5

1

1.5

2

2.5

3

3.5

0 0.2 0.4 0.6 0.8 1 1.2 1.4

Th

rou

gh

pu

t o

f th

e s

yste

m (

in G

iga

bits

pe

r se

con

d)

On-chip memory available for Bloom filters (in Megabits)

p=0.001p=0.01

Throughput not limited by false positives

Probability of

occurrence of a true

match per 100 or 1000 characters

Size of on-chip SRAM

System Throughput

Throughput more than

OC-48

Page 34: Application of Bloom Filters for Longest Prefix Matching and String Matching Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy, David

Sarang Dharmapurikar

Implementation of Bloom filter on FPGA

m = 5 x 4096 = 20480k = 5 x 2 = 10k = (m/n)ln2 =>n = 1419f = (1/2)10 =

0.0009765

Hash

Value

Generator

X

1

1

1

1

1

m-bit vector

1

1

1

Hash

Value

Generator

Block RAMs

FPGA Implementation

40

96

bit

s

1-bit

2 ad

dre

ss p

ort

s

2 d

ata

po

rts

Dual port Block RAM

Page 35: Application of Bloom Filters for Longest Prefix Matching and String Matching Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy, David

Sarang Dharmapurikar

Instantiation of mini-Bloom filters

Hash

Value

Generator

Block RAMs

mini-Bloom filter

Wrapper

Array of mini-Bloom filters

distributes the strings uniformly over a set of mini-Bloom filters

Can support 10,000 strings with false positive probability of 0.00097, using 35 on-chip Block RAMs

Page 36: Application of Bloom Filters for Longest Prefix Matching and String Matching Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy, David

Sarang Dharmapurikar

Partial Bloom Filter

addrA

weA

dinA

40

96 b

its

doutA

doutB

addrB

weB

dinB

‘0’

HashValue

Calculator

H1(X)

H2(X)

X

Output

(match/no match)

1 bit

RequestDecoder

BRAM #

Address

Bit

Valid PBF

Page 37: Application of Bloom Filters for Longest Prefix Matching and String Matching Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy, David

Sarang Dharmapurikar

Bloom Filter

PBF 2

PBF 3

PBF 4

PBF 5

PBF 1

H1

H2

H3

H4

H5

H6

H7

H8

H9

H10

Match

Control Interface

HashValue

Calculator

X

Page 38: Application of Bloom Filters for Longest Prefix Matching and String Matching Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy, David

Sarang Dharmapurikar

System Overview

ControlPacketProcessor Bloom

Filter

Hash Table Interface

SDRAM Controller

SDRAM

Protocol Wrappers

Controller Controller

Input Output