Peacock Hash: Deterministic and Updatable Hashing for High Performance Networking Sailesh Kumar Jonathan Turner Patrick Crowley

Peacock Hash: Deterministic and Updatable Hashing for High

Performance Networking

Sailesh KumarJonathan TurnerPatrick Crowley

2 - Sailesh Kumar - 04/20/23

Overview

Overview of Hash Tables and Segmented Hash Table

Analysis and Limitations» Increased memory references

Adding Bloom Filters per segment

Selective Filter Insertion Algorithm

Simulation Results and Analysis

Conclusion


Hash Tables

Consider the problem of searching an array for a given value

» If the array is not sorted, the search requires O(n) time

» If the array is sorted, we can do a binary search– O(lg n) time

» Can we do in O(1) time– Hash table– Use hash function to map elements to table cells


Hash Tables

Suppose our hash function gave us the following values:» hash("apple") = 5

hash("watermelon") = 3hash("grapes") = 8hash("cantaloupe") = 7hash("kiwi") = 0hash("strawberry") = 9hash("mango") = 6hash("banana") = 2

» hash("honeydew") = 6

This is called collision» Now what

kiwi

bananawatermelon

applemango

cantaloupegrapes

strawberry

0

1

2

3

4

5

6

7

8

9


Collision Resolution Policies

Linear Probing»Successively search for the first empty subsequent

table entry

Linear Chaining»Link all collided entries at any bucket as a linked-list

Double Hashing»Uses a second hash function to successively index

the table


Performance Analysis

Average performance is O(1) However, worst-case performance is O(n) In fact the likelihood that a key is at a distance

> 1 is pretty high

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

10 20 30 40 50 60 70 80 90 100

Load m/n (%)

Pro

babi

lity

ke y d is ta n c e > 1

ke y d is ta n c e > 2

These keys will take twice time to be

probed

These will take thrice the time to be

probed

Pretty high probability that throughput is half or three times lower than the peak

throughput


Hashing in Network Processors

High query latency (memory access)»Hide latency with multiple threads

Queryrequests

HashTable


Segmented Hashing

Uses power of multiple choices» has been proposed earlier by Azar et al

A N-way segmented hash» Logically divides the hash table array into N equal segments» Maps the incoming keys onto a bucket from each segment» Picks the bucket which is either empty or has minimum keys

k i

h( ) k i is mappedto this bucket

k i+1

h( )k i+1 is mappedto this bucket

2 1 1 1 2 1 21 2

A 4-way segmented hash table

12


Segmented Hash Performance

More segments improves the probabilistic performance» With 64 segments, probability that a key is inserted at

distance > 2 is nearly zero even at 100% load» Improvement in average case performance is still modest

1E-15

1E-12

1E-09

1E-06

1E-03

1E+00

10 20 30 40 50 60 70 80 90 100

Load m/n (%)

Pro

b. {

key

dist

ance

> 1

}

1 s e g me n t

4

16

64

32

8

1E-15

1E-12

1E-09

1E-06

1E-03

1E+00

10 20 30 40 50 60 70 80 90 100

Load m/n (%)

Pro

b. {

key

dist

ance

> 2

} 1 s e g me n t

4

16

32

8


An obvious Deficiency

Even though distance of keys are one, every query requires at least N memory probes» Average probes are O(N) compared to O(1) of a naive table

– If things are bandwidth limited, N times lower throughput

In order to ensure O(1) operations, segmented hash table uses on-chip Bloom filters» On-chip memory requirements are quite modest, 1-2 bytes per

hash table bucket

Each segment has a Bloom filter, which supports membership queries» These on-chip filters are queried before actually making an off-

chip hash table memory reference


Adding per Segment Filters

0

1

0

2 1 1 1 2 0 1 21 2

k ih( ) k i can go to any of the 3 buckets

1

0

0

0

0

1

1

0

1

h1(ki)

h2(ki)

hk(ki)

:

mb bits

We can select any of the above three segments and insert the key into the

corresponding filter


False Positive Rates

With Bloom Filters, there is likelihood of false positives» A filter might say that the key is present in its segment, while

key is actually not present

With N segments, clearly the false positive rates will be at least N times higher» In fact, it will be even higher, because we have to also

consider several permutations of false positives

We propose Selective Filter Insertion algorithm, which reduces the false positive rates by several orders of magnitudes


Selective Filter Insertion Algorithm

0

1

0

k ih( )

2 1 1 1 2 0 1 21 2

k i can go to any of the 3 buckets

1

0

0

0

0

1

1

0

1

h1(ki)

h2(ki)

hk(ki)

:

mb bits

Insert the key into segment 4, since fewer bits are set. Fewer

bits are set => lower false positive

With more segments (or more choices), our

algorithm sets far fewer bits in the Bloom filter


Selective Filter Insertion Results

1E-11

1E-09

1E-07

1E-05

1E-03

1E-01

8 16 24 32 40 48 56 64

Bits per entry m b

/ n i

Fal

se p

ositi

ve p

roba

bilit

y

k = 8

O p t im um k

N o r m a l B lo o m f ilt e r

Se le c t iv e F ilt e r I n se r t io n s


Selective Filter Insertion Details

First we build the set of segments where the arriving key can be inserted, we call it {minSet}» i.e. these segments will have minimum and equal collision

chain length at the corresponding hash index

A naive or greedy algorithm will choose the segment, where least number of bits are set in the Bloom filter» Leads to unbalanced segments» An already loaded segment is likely to receive further keys

because its filter array is more likely to have fewer transitions» Our simulations suggest that an enhancement in the insertion

algorithm reduces the false positive further by up to an order of magnitude


Selective Filter Insertion Enhancement

Our aim is to try to keep the segments balanced while also trying to reduce the bit transitions in the Bloom filters

1. Label segments in the set {minSet} eligible if its occupancy is less than (1+δ) times the occupancy of the least occupied segment. Parameter δ is typically set at 0.1 to 0.01.

2. If no segment remains eligible, select the least occupied segment from {minSet}

3. Otherwise choose a segment from {minSet}, which has minimum bit transitions

4. If multiple such segments exist, choose the least occupied one

5. If multiple such segments are again found, break the tie with a round-robin arbitration policy


Simulation Results

64K buckets, 32 bits/entry Bloom filter. Simulation runs for 500 phases.

» During every phase, 100,000 random searches are performed. Between every phase 10,000 random keys are deleted and inserted.

Hash policy = Double Hashing

1

1.1

1.2

1.3

1.4

1.5

0 20 40 60 80 100

Load (%)

Hash policy = Linear Chaining

1

1.1

1.2

1.3

1.4

1.5

0 20 40 60 80 100

Load (%)

Avg

. sea

rch

time

1 s eg m en t

4

1 66 4

1 s eg m en t

4

1 66 4


Effectiveness of Modified Bloom Filters

Plotting average memory references at different successful search rates.» Lower memory references reflects the effectiveness of filters.

Load is kept at 80%.

1E-07

1E-06

1E-05

1E-04

1E-03

1E-02

1E-01

1E+00

1E+01

1E+00 1E-02 1E-04 1E-06 1E-08 1E-10Probability of successful search

1E-07

1E-06

1E-05

1E-04

1E-03

1E-02

1E-01

1E+00

1E+01

1E+00 1E-02 1E-04 1E-06 1E-08 1E-10Probability of successful search

Avg

. mem

ory

refe

renc

es

S eg m en ted h as hn aiv e B lo o m fi l t er

S in g le h as h t ab le

S eg m en ted h as h w i thS elect iv e F i l t er In s ert io n

S in g le h as h t ab le

S eg m en ted h as h w i thS elect iv e F i l t er In s ert io n

S eg m en ted h as hn aiv e B lo o m fi l t er

3 2 fi l t er b i t s p er elem en t 1 6 fi l t er b i t s p er elem en t


Questions?

Documents

Peacock Hash: Deterministic and Updatable Hashing for High Performance Networking Sailesh Kumar Jonathan Turner Patrick Crowley