29
1 Hash table

1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table

Embed Size (px)

Citation preview

Page 1: 1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table

1

Hash table

Page 2: 1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table

2

Objective

To learn: Hash function Linear probing Quadratic probing Chained hash table

Page 3: 1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table

3

A basic problem

We have to store some records and perform the following:add new recorddelete recordsearch a record by key

Find a way to do these efficiently!

Page 4: 1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table

4

Unsorted array

Use an array to store the records, in unsorted orderadd - add the records as the last entry fast

O(1)delete a target - slow at finding the target,

fast at filling the hole (just take the last entry) O(n)

search - sequential search slow O(n)

Page 5: 1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table

5

Sorted array

Use an array to store the records, keeping them in sorted orderadd - insert the record in proper position.

much record movement slow O(n)delete a target - how to handle the hole after

deletion? Much record movement slow O(n)search - binary search fast O(log n)

Page 6: 1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table

6

Linked list

Store the records in a linked list (sorted / unsorted) add - fast if one can insert node anywhere O(1)delete a target - fast at disposing the node, but

slow at finding the target O(n)search - sequential search slow O(n)

(if we only use linked list, we cannot use binary search even if the list is sorted.)

Page 7: 1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table

7

Array as table

9903030

98020209801010

0056789

00123450033333

tom

marypeter

david

andybetty

73

10020

56.8

81.590

studid name score

9908080 bill 49

...

...

Consider this problem. We want to store 1000 student records and search them by student id.

Consider this problem. We want to store 1000 student records and search them by student id.

Page 8: 1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table

8

Array as table

:33333

:12345

0:

:betty

:andy

:

:90:

81.5:

studid name score

56789 david 56.8

:9908080

::

:bill::

:49::

9999999

One ‘naive’ way is to store the records in a huge array (index 0..9999999). The index is used as the student id, i.e. the record of the student with studid 0012345 is stored at A[12345]

One ‘naive’ way is to store the records in a huge array (index 0..9999999). The index is used as the student id, i.e. the record of the student with studid 0012345 is stored at A[12345]

Page 9: 1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table

9

Array as table

Store the records in a huge array where the index corresponds to the keyadd - very fast O(1)delete - very fast O(1)search - very fast O(1)

But it wastes a lot of memory! Not feasible.

Page 10: 1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table

10

Hash function

function Hash(key: KeyType): integer;

Imagine that we have such a magic function Hash. It maps the key (stud_id) of the 1000 records into the integers 0..999, one to one. No two different keys maps to the same number.

Imagine that we have such a magic function Hash. It maps the key (stud_id) of the 1000 records into the integers 0..999, one to one. No two different keys maps to the same number.

H(‘0012345’) = 134H(‘0033333’) = 67H(‘0056789’) = 764…H(‘9908080’) = 3

Page 11: 1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table

11

Hash table

:betty

:bill:

:90:

49:

studid name score

andy 81.5

::

david:

::

56.8:

:0033333

:9908080

:

0012345

::

0056789:

3

67

0

764

999

134

To store a record, we compute Hash(stud_id) for the record and store it at the location Hash(stud_id) of the array. To search for a student, we only need to peek at the location Hash(target stud_id).

To store a record, we compute Hash(stud_id) for the record and store it at the location Hash(stud_id) of the array. To search for a student, we only need to peek at the location Hash(target stud_id).

Page 12: 1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table

12

Hash table with Perfect Hash

Such magic function is called perfect hashadd - very fast O(1)delete - very fast O(1)search - very fast O(1)

But it is generally difficult to design perfect hash. (e.g. when the potential key space is large)

Page 13: 1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table

13

Hash function

A hash function maps a key to an index within in a range

Desirable properties:simple and quick to calculateeven distribution, avoid collision as much as

possible

function Hash(key: KeyType);

Page 14: 1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table

14

Division Method

Certain values of m may not be good:

Good values for m are prime numbers which are not close to exact powers of 2. For example, if you want to store 2000 elements then m=701 (m = hash table length) yields a hash function:

h(k) = k mod m

h(key) = k mod 701

Page 15: 1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table

15

Collision

For most cases, we cannot avoid collision Collision resolution - how to handle when

two different keys map to the same index

H(‘0012345’) = 134H(‘0033333’) = 67H(‘0056789’) = 764…H(‘9903030’) = 3H(‘9908080’) = 3

Page 16: 1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table

16

The problem arises because we have two keys that hash in the same array entry, a collision. There are two ways to resolve collision:

Hashing with Chaining: every hash table entry contains a pointer to a linked list of keys that hash in the same entry

Hashing with Open Addressing: every hash table entry contains only one key. If a new key hashes to a table entry which is filled, systematically examine other table entries until you find one empty entry to place the new key

Hash Tables

Page 17: 1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table

17

Open Addressing The key is first mapped to a slot:

If there is a collision subsequent probes are performed:

If the offset constant, c and m are not relatively prime, we will not examine all the cells. Ex.: Consider m=4 and c=2, then only every other slot is checked.

When c=1 the collision resolution is done as a linear search. This is known as linear probing.

)( index 10 ki h

0formod)(1 jmcii jj

0 1 2 3

Page 18: 1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table

18

Linear Probing example1

89

18

89

49

18

89

49

58

18

89

49

58

9

18

89

Insert: 89, 18, 49, 58, 9 to table size=10, hash function is: %tablesize

0

1

2

3

4

5

6

7

8

9

Page 19: 1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table

19

Linear Probing Example-2 Single character keys, table size, m=8 Hash function (map characters to range 0...7):

k: APQ BOR CNS DMT ELU FKN GJWZ HIXY

h1(k): 0 1 2 3 4 5 6 7

ActionStore AStore CStore DStore GStore PStore QDelete PDelete QLoad BLoad RLoad Q

0AAAAAAAAAAA

1

PP±±BBB

2

CCCCCCCCCC

3

DDDDDDDDD

4

QQ±±RR

5

Q

6

GGGGGGGG

# probes

111125251 4 6

7

Page 20: 1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table

20

Choosing a Hash Function Notice that the insertion of Q required several probes (5). This

was caused by A and P mapping to slot 0 which is beside the C and D keys.

The performance of the hash table depends on a having a hash function which evenly distributes the keys. The statistics of the key distribution needs to be accounted

for. For example, choosing the first letter of a surname will cause problems depending on the nationality of the population; the variable names in a compiler often differ by one character, eg., t1, t2, t3, etc.

Consult computer science texts, such as Knuth’s “The Art of Computer Programming”.

Page 21: 1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table

21

Clustering Even with a good hash function, linear probing has its problems:

The position of the initial mapping i 0 of key k is called the home position of k.

When several insertions map to the same home position, they end up placed contiguously in the table. This collection of keys with the same home position is called a cluster.

As clusters grow, the probability that a key will map to the middle of a cluster increases, increasing the rate of the cluster’s growth. This tendency of linear probing to place items together is known as primary clustering.

As these clusters grow, they merge with other clusters forming even bigger clusters which grow even faster.

Page 22: 1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table

22

Performance Analysis If n slots in a table of size m are occupied, the load

factor is defined as

where =1 means the table is full, and =0 means the table is empty.

It can be shown that the number of probes in a successful search, C, and the number of probes in an unsuccessful search, C’ is given by:

21

11

21

11

121

C

C

mn

Page 23: 1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table

23

Quadratic Probing h(k)=h(k) + f(i) ( i=0,1,2,…)%TS h(k)=Rmod TS f(i)=i2

Theorem 20.4: If quadratic probing is used and the table size is prime, then a new element can always be inserted if the table is at least half empty. Furthermore, in the course of the insertion, no cell is probed twice.

Page 24: 1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table

24

Quadratic probing-example

89

18

89

49

18

89

49

58

18

89

49

58

9

18

89

Insert: 89, 18, 49, 58, 9 to table size=10, hash function is: %tablesize

0

1

2

3

4

5

6

7

8

9

Page 25: 1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table

25

Double Hashing Recall that in open addressing the sequence of probes follows

We can solve the problem of primary clustering in linear probing by having the keys which map to the same home position use differing probe sequences. In other words, the different values for c should be used for different keys.

Double hashing refers to the scheme of using another hash function for c

Note that h1 and h2 need to be evaluated only once per key.

0formod)(1 jmcii jj

1)(0and0formod))(( 221 mkjmkii jj hh

Page 26: 1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table

26

Chained Hash Table

2

4

10

3

nil

nilnil

5

nil

:

HASHMAX Key: 9903030name: tomscore: 73

One way to handle collision is to store the collided records in a linked list. The array now stores pointers to such lists. If no key maps to a certain hash value, that array entry points to nil.

One way to handle collision is to store the collided records in a linked list. The array now stores pointers to such lists. If no key maps to a certain hash value, that array entry points to nil.

Page 27: 1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table

27

Chained Hash table Hash table, where collided records are

stored in linked listgood hash function, appropriate hash size

Few collisions. Add, delete, search very fast O(1)

otherwise… some hash value has a long list of collided records.. add - just insert at the head fast O(1) delete a target - delete from unsorted linked list

slow search - sequential search slow O(n)

Page 28: 1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table

28

Common errors (page 749)

Providing a poor hash function Not rehashing when load factor reaches

0.5.

More errors listed on page 749 of the book

Page 29: 1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table

29

In class exercises

20.1, 20.2 and 20.5 in the book on page 750.