Upload
sandeepappu1644
View
424
Download
0
Embed Size (px)
DESCRIPTION
data structure
Citation preview
Open Addressing:Linear Probing
Data Structures and Algorithms
Linear Probing
• The easiest method to probe the bins of the hash table is to search forward linearly
• Assume we are inserting into bin i:– if bin i is empty, we occupy it– otherwise, check bin i + 1, i + 2, and so on, until an empty bin is
found– if we reach the end of the hash table, we start at the front (bin 0)
Linear Probing
• For example, suppose that our hash function converts a 2-digit integer into a single digit by taking the least-significant digit
• Not necessarily a bad idea: for most natural data, the least-significant digit is approximately random
• Such distributions are used to catch tax fraud
Linear Probing: Insertions
• Insert the numbers
81, 70, 97, 60, 51, 38, 89, 68, 24
into the initially empty hash table:
0 1 2 3 4 5 6 7 8 9
Linear Probing : Insertions
• We can easily insert 81, 70, and 97 into their corresponding bins:
0 1 2 3 4 5 6 7 8 9
70 81 97
Linear Probing : Insertions
• Inserting 60 causes a collision in bin 0, therefore, we check:– bin 1 (also full), and– bin 2 (empty)
0 1 2 3 4 5 6 7 8 9
70 81 60 97
Linear Probing : Insertions
• Inserting 51 also causes a collision, this time, in bin 1, therefore, we check:– bin 2 (also full), and– bin 3 (empty)
0 1 2 3 4 5 6 7 8 9
70 81 60 51 97
Linear Probing : Insertions
• 38 and 89 can be placed into bins 8 and 9 respectively without collisions
0 1 2 3 4 5 6 7 8 9
70 81 60 51 97 38 89
Linear Probing : Insertions
• Inserting 68 causes a collision in bin 8, and therefore we check bins:– 9, 0, 1, 2, 3, and finally 4 which is empty– insert 68 into bin 4
0 1 2 3 4 5 6 7 8 9
70 81 60 51 68 97 38 89
Linear Probing : Insertions
• Inserting 24 causes a collision in bin 4, however the next bin is empty
0 1 2 3 4 5 6 7 8 9
70 81 60 51 68 24 97 38 89
Linear Probing: Searching
• Testing for membership is similar to insertions• Start at the appropriate bin, and continue searching
forward until either:– the item is found,– an empty bin is found, or– we have traversed the entire array
• The last case will only occur if the hash table is full
Linear Probing: Searching
• Searching for 68, we first examine bin 8, then 9, 0, 1, 2, 3, and 4, finding 68 in bin 4
• Searching for 23, we search bins 3, 4, 5, and bin 6 is empty, so 23 is not in the table
0 1 2 3 4 5 6 7 8 9
70 81 60 51 68 24 97 38 89
Linear Probing: Removing
• We cannot simply remove elements from the hash table• For example, if we delete 89 by removing it, we can no
longer find 68
0 1 2 3 4 5 6 7 8 9
70 81 60 51 68 24 97 38 89
Linear Probing: Removing
• However, we cannot simply move all entries up to fill the gap
• Moving 70 to bin 9 would make it impossible to find 70
0 1 2 3 4 5 6 7 8 9
70 81 60 51 68 24 97 38 89
81 60 51 68 24 97 38 70
Linear Probing: Removing
• Instead, we must probe forward, moving only those elements which would not be moved to a location before their bin starts
• For example, we remove 89
0 1 2 3 4 5 6 7 8 9
70 81 60 51 68 24 97 38
Linear Probing: Removing
• We probe forward until we find an entry which can be moved into bin 9
• We cannot move 70, 81, 60, or 51, but we can move 68
0 1 2 3 4 5 6 7 8 9
70 81 60 51 24 97 38 68
Linear Probing: Removing
• Next, we search forward again, and note that 24 can be moved forward
• The next cell is already empty, and therefore we are finished
0 1 2 3 4 5 6 7 8 9
70 81 60 51 24 97 38 68
Linear Probing: Removing
• Suppose we now remove 60• Begin searching forward from bin 0
0 1 2 3 4 5 6 7 8 9
70 81 60 51 24 97 38 68
Linear Probing: Removing
• We find 60 in bin 2, and therefore we remove it• We search forward and find that we can move 51 into bin
2
0 1 2 3 4 5 6 7 8 9
70 81 51 24 97 38 68
Linear Probing: Removing
• We cannot move 24 forward• The next bin (5) is empty, therefore we are finished
0 1 2 3 4 5 6 7 8 9
70 81 51 24 97 38 68
Primary Clustering
• We have already observed the following phenomenon:– as we insert more elements into the hash table, the contiguous
regions get larger
• This results in longer search times
Primary Clustering
• Consider inserting the following entries 81, 70, 97, 63, 76, 38, 85, 68, 21, 9, 55, 73, 57, 60, 72, 74, 85, 16, 61, 7, 49
• Use the number modulo 25 to determine which bin it should occupy
• The first four don’t cause any collisions
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
76 81 63 70 97
Primary Clustering
• Inserting 38 causes a collision in bin 13• The next seven do not cause any further collisions
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
76 55 81 57 9 85 63 38 68 70 21 97 73
Primary Clustering
• The next four insertions cause collisions:60 (bin 10)
72 (bin 22)
74 (bin 24)
85 (bin 10)
• We can safely insert 16 into bin 16
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
74 76 55 81 57 9 85 60 85 63 38 16 68 70 21 97 73 72
Primary Clustering
• The remaining insertions all cause collisions:61 (bin 11)
7 (bin 7)
49 (bin 24)
• The joining of smaller groups into one large group is termed coalescing
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
74 76 49 55 81 57 7 9 85 60 85 63 38 61 16 68 70 21 97 73 72
Primary Clustering
• As the load factor increased, the probability of a collision increased
• Justification:– suppose that a chain is of length m– an insertion either into any bin occupied by the chain or into the
locations immediately before or after it will increase the length of the chain
Primary Clustering
• Example, using the last two digits, consider the following hash table
• Any insertion into bins 29 through 34 will increase the length of the chain
... 28 29 30 31 32 33 34 35 ...
230 531 730 432
Primary Clustering
• Consequently, if a chain is of size m, then the probability that it will be increased in length is (m + 2)/M where M is the size of the hash table
• The more a chain grows, the more likely it will grow in the future
Primary Clustering
• The length of these chains will affect the number of probes required to perform insertions, accesses, or removals
• It is possible to estimate the average number of probes for a successful search, where is the load factor:
• For example: if = 0.5, we 1.5 probes
1
112
1
Primary Clustering
• The number of probes for an unsuccessful search or for an insertion is higher:
• For 0 ≤ ≤ 1, then (1 – )2 ≤ 1 – , and therefore the reciprocal will be larger
• Again, if = 0.5 then we require 2.5 probes
221
1
11
Primary Clustering
• The following plot shows how the number of required probes increases
Primary Clustering
• Our goal was to keep all operations O(1)
• Unfortunate, as grows, so does the run time• One solution is to keep the load factor under a given
bound• If we choose = 2/3, then the number of probes for
either a successful or unsuccessful search is 2 and 5, respectively
Primary Clustering
• Therefore, we have three choices:– Choose M large enough so that we will not pass this load factor– Double the number of bins if the chosen load factor is reached– Choose a different strategy from linear probing
Primary Clustering
• The first solution (choose M sufficiently large) is most useful if we know all the possible entries
• The second (doubling) is only useful if we have an environment where we can dynamically allocate memory
• For the third, we will look at quadratic probing and double hashing