12

Click here to load reader

Data Structures - 08. Hash Tables, II

Embed Size (px)

DESCRIPTION

Slide da cadeira de Estrutura de Dados, ministrado pelo Prof. Dr. Christian Pagot, na Universidade Federal da Paraíba.

Citation preview

Page 1: Data Structures - 08. Hash Tables, II

Universidade Federal da ParaíbaCentro de Informática

Hash Tables IILecture 11

1107186 – Estrutura de Dados – Turma 02

Prof. Christian Azambuja PagotCI / UFPB

Page 2: Data Structures - 08. Hash Tables, II

2Universidade Federal da ParaíbaCentro de Informática

Arbitrary Hash Functions

● Suppose the following hash table structure:– 10 buckets (from 0 to 9).

– Collision solved through separate chain.

● Suppose the following key source:– Keys are integers that are (for some obscure

reason) multiple of 5.

● Suppose the following hash and compression functions:

hash(x )= xcompressed hash ( y )= y mod 10

Page 3: Data Structures - 08. Hash Tables, II

3Universidade Federal da ParaíbaCentro de Informática

Arbitrary Hash Functions

● ResultsKey Compressed

Hash

5 5

10 0

15 5

20 0

25 5

30 0

35 5

40 0

45 5

50 0

The number of buckets and the keys have terms

in common!

Page 4: Data Structures - 08. Hash Tables, II

4Universidade Federal da ParaíbaCentro de Informática

Arbitrary Hash Functions

● Hashing strings:– The straightforward approach is to sum the ASCII

value of each character.

– However, since the words are reasonably short, and the sum won't be that large!

The hash of thousands of words will get concentrated

in the first buckets!

Page 5: Data Structures - 08. Hash Tables, II

5Universidade Federal da ParaíbaCentro de Informática

Good Hash Functions

● In a good hash function, any key is equally likely to hash to any bucket:– Minimize collisions!

● It will also depend on the distribution of the keys:– That we usually do not know!

Page 6: Data Structures - 08. Hash Tables, II

6Universidade Federal da ParaíbaCentro de Informática

The Division Method

● A key k is mapped into one of the N buckets by taking the remainder of k mod N:

h(k )=k mod N

k k mod 2 k mod 3 k mod 4 k mod 51 1 1 1 1

2 0 2 2 2

3 1 0 3 3

4 0 1 0 4

5 1 2 1 0

Page 7: Data Structures - 08. Hash Tables, II

7Universidade Federal da ParaíbaCentro de Informática

The Division Method

● Good N candidates are prime numbers not so close to power-of-two numbers:– Suppose we want to create a hash table to store

2000 items.

– We don't mind if we have to read 3 elements in a search that fail.

– A good value to N is 701:

h(k )=k mod 701

Page 8: Data Structures - 08. Hash Tables, II

8Universidade Federal da ParaíbaCentro de Informática

The Multiplication Method

● Operates in two steps:– 1) Multiply the key by a constant A (0 < A < 1)

and extract the fractional part.

– 2) Multiply the result by N (number of buckets) and take the floor.

h(k )=⌊N (Ak mod 1)⌋

Page 9: Data Structures - 08. Hash Tables, II

9Universidade Federal da ParaíbaCentro de Informática

The Multiplication Method

k └N(kA mod 1)┘ N=5, A=0.1

└N(kA mod 1)┘ N=5, A=0.6180

1 0 3

2 1 1

3 1 4

4 2 2

5 2 0

6 3 3

7 3 1

8 4 4

9 4 2

10 0 0

Page 10: Data Structures - 08. Hash Tables, II

10Universidade Federal da ParaíbaCentro de Informática

Universal Hashing

● A randomized algorithm H for constructing hash functions

is universal if for all x ≠ y in U, we have

h :U→ {1,…, N }

Pr [h(x )=h( y )]≤1N

Page 11: Data Structures - 08. Hash Tables, II

11Universidade Federal da ParaíbaCentro de Informática

Universal Hashing

● Example– Choose a prime number p large enough so that

any key k is in Zp = {0, 1,..., p-1}.

– Z*p = {1,..., p-1}.

– a is a number in Z*p and b is a number in Zp.

– An universal hash function hab can be then defined

as

ha ,b(k )=((ak+b)mod p)mod N

Page 12: Data Structures - 08. Hash Tables, II

12Universidade Federal da ParaíbaCentro de Informática

Perfect Hashing

● A hash function is perfect if the complexity of a search is O(1) (constant) in the worst case.

● Perfect hashing is accomplished by using universal hashing in two levels.– In the first level, a hash function selected from a

family of universal hash functions is used.

– Collisions are solved by inserting the keys in a secondary hash table with an associated hash function.