ECE250: Algorithms and Data Structures
Hash Tables (Part A)
Materials from CLRS: Chapter 11.1, 11.2, 11.4
Ladan Tahvildari, PEng, SMIEEE Professor
Software Technologies Applied Research (STAR) Group
Dept. of Elect. & Comp. Eng.
University of Waterloo
Acknowledgements
v The following resources have been used to prepare materials for this course: Ø MIT OpenCourseWare Ø Introduction To Algorithms (CLRS Book) Ø Data Structures and Algorithm Analysis in C++ (M. Wiess) Ø Data Structures and Algorithms in C++ (M. Goodrich)
v Thanks to many people for pointing out mistakes, providing suggestions, or helping to improve the quality of this course over the last ten years: Ø http://www.stargroup.uwaterloo.ca/~ece250/acknowledgment/
Lecture 8 ECE250 2
Lecture 8 ECE250 3
The Problem
v RT&T is a large phone company, and they want to provide caller ID capability:
Ø given a phone number, return the caller’s name
Ø phone numbers range from 0 to r = 108 -1
Ø want to do this as efficiently as possible
Lecture 8 ECE250 4
A Potential Solution
v A suboptimal way to design this dictionary is an array indexed by key
§ takes O(1) time
§ O(r) space - huge amount of wasted space
(null) (null) Jens Jensen
(null) (null)
0000-0000 0000-0000 9635-8904 0000-0000 0000-0000
Lecture 8 ECE250 5
Symbol-Table Problem
v Symbol table holding records T n
x ][ xk ey Operations on
Ø INSERT
Ø DELETE
Ø SERACH
records T),( xT
),( xT
),( kT
How should the data structure T be organized?
Lecture 8 ECE250 6
Direct Access Table
IDEA: Suppose that the set of keys is and keys are distinct. Set up an array Operations take time
Problem: The range of keys can be large
64-bit numbers (represent 18,446,744,073,709,551,616 different keys)
}1,...,1,0{ −⊆ mK
]1..0[ −mT
otherwiseNILkxkeyandKkifx
kT=∈
⎩⎨⎧
=][
][
)1(θ
Lecture 8 ECE250 7
Hash Functions
Solution: Use a hash function to map the universe of all keys into
hU }1,...,1,0{ −m
U
K1k
3k
4k2k
When a record to be inserted maps to an already occupied slot in , a collision occurs T
T0
1−m
)( 1kh
)( 4kh
)()( 32 khkh =
Lecture 8 ECE250 8
Collision Resolution
v How to deal with two keys which hash to the same spot in the array?
Ø Use chaining which sets up an array of links (a table), indexed by the keys, to lists of items with the same key
Lecture 8 ECE250 9
An Example
Given the following input and the following hash function
Show the resulting hash table using Chaining
}9789,5879,5344,2699,5973,4123,3171{
10mod)( xxh =
Lecture 8 ECE250 10
Collision Resolution by Chaining
3171
5973 4123
5344
2699 5879 9789
0
1
2
3
4
5
6
7
8
9
Lecture 8 ECE250 11
Dictionary Operations with Chaining
v Search: CHAINED-HASH-SEARCH(T, k) Ø search for an element with key k in list T [h(k)]
v Insertion: CHAINED-HASH-INSERT(T, x) Ø insert x at the head of list T [h(key[x])]
v Deletion: CHAINED-HASH-DELETE(T, x) Ø delete x from the list T [h(key[x])]
Lecture 8 ECE250 12
Analysis of Hashing
v Assumption: Each key is equally likely to be hashed into any slot of table independent of where other keys are hashed
Simple Uniform Hashing
v Given hash table with slots holding elements, the load factor is defined as
Average Number of Keys per Slot v Assume time to compute is v Search Time:
T
T
)(kh
mn /=αm n
)1(θ)1( αθ +
Lecture 8 ECE250 13
Analysis of Operations with Chaining v Assuming the number of hash table slots is proportional to the
number of elements in the table
Ø Search: § takes constant time on average
Ø Insertion: § takes O(1) worst-case time
o Assumes that the element being inserted isn’t already in the list o It would take an additional search to check if it was already inserted
Ø Deletion: § takes O(1) worst-case time when the lists are doubly linked § If the lists are singly linked, then deletion takes as long as searching, because we
must find x’s predecessor in its list in order to correctly update next pointers
( )n O m=/ ( ) / (1)n m O m m Oα = = =
Lecture 8 ECE250 14
More on Collisions
v A key is mapped to an already occupied table location Ø what to do?!?
v Use a collision handling technique v We have seen Chaining v Can also use Open Addressing
Ø Linear Probing Ø Double Hashing
Lecture 8 ECE250 15
Open Addressing v All elements are stored in the hash table (n ≤ m) v Insertion systematically probes the table until an
empty slot is found à The table may fill up! v Modify hash function to take the probe number i as
the second parameter (depends on both the key and the probe number)
Hash function h determines the sequence of slots examined for a given key
v Probe sequence for a given key k
{ } { }: 0,1,..., 1 0,1,..., 1h U m m× − → −slot number probe number
( ,0), ( ,1),..., ( , 1) - a permutation of 0,1,..., 1h k h k h k m m− −
Lecture 8 ECE250 16
Linear Probing
v If the current location is used, try the next table location
v Uses less memory than chaining Ø one does not have to store all those links
v Slower than chaining (Primary Clustering) Ø one might have to walk along the table for a long time
LinearProbingInsert(k) 01 if (table is full) error 02 probe = h(k) 03 while (table[probe] occupied) 04 probe = (probe+1) mod m 05 table[probe] = k
mikhikh mod))((),( ' +=
Lecture 8 ECE250 17
Hash Tables - Example 1
Given the following input and the following hash function
Show the resulting hash table using Ø Chaining Ø Linear Probing
}9789,5879,5344,2699,5973,4123,3171{
10mod)( xxh =
Lecture 8 ECE250 18
Collision Resolution by Chaining
3171
5973 4123
5344
2699 5879 9789
0
1
2
3
4
5
6
7
8
9
Lecture 8 ECE250 19
Collision Resolution by Linear Probing
0
1
2
3
4
5
6
7 8
9
3171
9789
5973
4123
5879
2699
5344
Hash Tables – Example 2
Show the resulting hash table using Linear Probing when the following keys:
{One, Two, Three, Four, Five, Six, Seven, Eight, Nine, Ten, Eleven, Twelve}
are inserted one-by-one, in the order given into an initially empty table. Assume that table size is . Use the division method of hashing. Use the following table of values for each key:
Lecture 8 ECE250 20
x value (hexadecimal) One 0x6EBE5 Two 0x75DAF Three 0x75A73925 Four 0x19EED32 Five 0x19E8DE5 Six 0x72A38 Seven 0x7293792E Eight 0x64A26A74 Nine 0x1BE8BE5 Ten 0x7592E Eleven 0x4993792E Twelve 0x292DDE5
m =16
Hash Tables – Example 2: Solution
0 1 2 3 4 5 6 7 8 9 A B C D E F
Lecture 8 ECE250 21
Ten Eleven Four
Eight One
Three Five Six
Nine Twelve
Seven Two