28
Beyond Bloom Filters: From Approximate Membership Checks to Approximate State Machines By F. Bonomi et al. Presented by Kenny Cheng, Tonny Mak Yui Ku en

Beyond Bloom Filters: From Approximate Membership Checks to Approximate State Machines By F. Bonomi et al. Presented by Kenny Cheng, Tonny Mak Yui Kuen

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Beyond Bloom Filters: From Approximate Membership

Checks to Approximate State Machines

By F. Bonomi et al.

Presented byKenny Cheng, Tonny Mak Yui Kuen

2

IntroductionIntroduction

A)A) MotivationMotivation

B)B) ObjectivesObjectives

C)C) Problem statementsProblem statements

3

A) MotivationA) Motivation

• Increasing trend to keep flow state in routers

• Large memory space (~100 bits per flow) is needed for storing a large amount of flow states

• If memory space can be reduced, using fast on-chip memory is feasible to improve performance

4

B) ObjectivesB) Objectives

• Introduce the idea of an Approximate Concurrent State Machine (ACSM), it sacrifices some accuracy for memory size.

• Introduce and compare several solutions to ACSM problem

• To find an approach with the highest accuracy to memory ratio

5

C) Problem statementsC) Problem statements

• Describe 3 techniques based on Bloom filters and hashing, and evaluate them using both theoretical analysis and simulation

6

Bloom Filter

• A data structure proposed by Bloom in 1970

• Designed for membership test, i.e. to test whether an element exists in a set

• Fast and compact

• Chance of false positive, i.e. an element not in the set may be wrongly identified

• No false negative, i.e. an element in the set must be identified correctly

7

How a Bloom Filter Works

• A bit array with all zeros initially• k hash functions

...1 2 k3

0 0 0 0 0 00 0 0 0 0 0 0 0

8

How a Bloom Filter Works

• Hash the element using the hash functions, get k indices in the bit array

• Mark the bits to 1

...1 2 k3

0 0 0 0 0 00 0 0 0 0 0 0 0

Insertion

x

0 0 1 0 0 00 0 1 1 0 0 0 1

9

How a Bloom Filter Works

• Hash the element using the hash functions• If all corresponding bits are 1, it’s in the set

...1 2 k3

0 0 1 0 0 10 0 1 1 1 0 0 1

Lookup

x

0 0 1 0 0 10 0 1 1 1 0 0 1

10

How a Bloom Filter Works

• Sorry, no deletion• You don’t know whether the bits are used by other

elements or not, cannot simply clear them

...1 2 k3

0 0 1 0 0 10 0 1 1 1 0 0 1

Deletion

x

0 0 ? 0 0 10 0 ? ? 1 0 0 ?

11

Counting Bloom Filter

• Use a counter to replace a bit• For insertion, increment the counters• For deletion, decrement the counters• Problems: more space, overflow counters

...1 2 k3

0 0 0 0 1 00 0 0 0 1 0 0 1

x

0 0 0 0 1 00 0 0 0 3 0 0 2 0 0 1 0 1 00 0 1 1 3 0 0 3

12

3 Approaches to ACSM

• Approaches:1. Direct Bloom Filter2. Stateful Bloom Filter3. Fingerprint-compressed Filter

• Operations need to implement:1. Insert(flow, state)2. Lookup(flow) returns (state)3. Delete(flow)4. Update(flow, new_state)

13

Direct Bloom Filter Approach

• Use counting Bloom filter• 4 operations:

Insert – insert (flow_id, state) pairLookup – if state is not provided, have to lookup every state, return “don’t know” if more than one state is foundDelete – lookup + decrement countersUpdate – delete old + insert new

• Improvement: use timing-based deletion to handle non-terminated flows

14

Timing-based Deletion

• Add a timing bit to each cell• Set the bit if the cell is touched• Clear untouched cells periodically, and reset timing bits• Alternative to DBF: use standard Bloom filter instead of

counting, delete elements only by time-based deletion

...1 2 k3

0 0 3 3 0 12 0 1 1 0 1 0 2

x

0 0 3 0 0 00 0 1 1 0 0 0 20 0 0 0 0 00 0 0 0 0 0 0 0Timing Bits 0 0 1 0 0 00 0 1 1 0 0 0 1

15

Stateful Bloom Filter Approach

• Direct Bloom Filter doesn’t store the state of a flow, need to lookup every state

• Improvement: add a state value for each cell for faster lookup

• Hash flow_id only, instead of (flow_id, state) pair

• Introduce a “don’t know” (DK) state when collision occurs

• Keep timing-based deletion

16

Stateful Bloom Filter Approach

• Insert, modify, delete – similar to Direct Bloom Filter, set the cell value to DK for collision (counter > 1)

• Lookup:If all cells are DK, return DKIf all cells are either state i or DK, return state iIf more than one state other than DK, return “not found”

17

1001010110 11100110000 40110111010 2

0111010100 11110011101 3

1100000110 30000111101 3

...

Fingerprint State

Fingerprint-compressed Filter Approach

• Store a fingerprint of flow + state in a d-left hashtable

...

x

...1 2 d

1110001000 1

18

Fingerprint-compressed Filter Approach

• Insert - hash the element, and find the corresponding bucket in each hash table, insert the fingerprint + state in the bucket with least number of elements (choose the left-most one to break ties)

• Lookup – retrieve the state of the fingerprint• Delete – remove the fingerprint• Update – direct update or remove old + add new• Make use of DK when a fingerprint is found in

multiple buckets• Timing-based deletion can still be applied

19

Simulation

• To investigate the size/accuracy trade-off for the 3 approaches

• State machine: 10 states• Legal state changes: 1 → 2 → 3 → … → 10• Run for 1 million flows• About 60000 simultaneous flows• 100 ± 40 packets for each flow• Some packets trigger state change

20

Simulation

• 3 kinds of simulation flows

• Interesting flows (30%) – flows with legal state changes only, always complete

• Noise flows (30%) – flows with random (can be legal or illegal) state changes, never complete

• Random flows (40%) – flows without state change

21

Simulation

False positive rate: % of completed flows which is not-interesting

False negative rate: % of interesting flows without completion

22

Applications

Place in the application level QoS:-

• Video congestion control

• Peer-to-Peer (P2P) traffic identification

23

Video congestion control

• Apply to MPEG video streaming

• 3 kinds of frames for MPEG video:I frame – scene informationP frame – differential informationB frame – least important information

• Can drop B frames up to 30% with acceptable quality

• Need to keep track of current frame

24

Video congestion control

• Use FCF ACSM to keep track of state

• Experimentally the highest false positive rate acceptable is 0.37%

• This requires a memory size of 27 bits per flow (about ¼ compared to original 100 bits)

25

P2P Traffic Identification

• To limit P2P flows to increase quality for other applications

• One possible way to identify a P2P flow:concurrent TCP and UDP flows

• Use ACSM for real-time P2P identification

26

ConclusionConclusion

• It’s feasible for ACSM

• FCF approach is the best approach

• Two potential applications are introduced for ACSM

• ACSM may be beneficial to QoS applications, which are fault-tolerant

27

Comments

• Authors focus on accuracy and memory size, but not real performance

• FCF approach may not perform well on hardware

- End -

Question & Answer