34
Open Addressing: Linear Probing Data Structures and Algorithms

Linear Probing

Embed Size (px)

DESCRIPTION

data structure

Citation preview

Page 1: Linear Probing

Open Addressing:Linear Probing

Data Structures and Algorithms

Page 2: Linear Probing

Linear Probing

• The easiest method to probe the bins of the hash table is to search forward linearly

• Assume we are inserting into bin i:– if bin i is empty, we occupy it– otherwise, check bin i + 1, i + 2, and so on, until an empty bin is

found– if we reach the end of the hash table, we start at the front (bin 0)

Page 3: Linear Probing

Linear Probing

• For example, suppose that our hash function converts a 2-digit integer into a single digit by taking the least-significant digit

• Not necessarily a bad idea: for most natural data, the least-significant digit is approximately random

• Such distributions are used to catch tax fraud

Page 4: Linear Probing

Linear Probing: Insertions

• Insert the numbers

81, 70, 97, 60, 51, 38, 89, 68, 24

into the initially empty hash table:

0 1 2 3 4 5 6 7 8 9

Page 5: Linear Probing

Linear Probing : Insertions

• We can easily insert 81, 70, and 97 into their corresponding bins:

0 1 2 3 4 5 6 7 8 9

70 81 97

Page 6: Linear Probing

Linear Probing : Insertions

• Inserting 60 causes a collision in bin 0, therefore, we check:– bin 1 (also full), and– bin 2 (empty)

0 1 2 3 4 5 6 7 8 9

70 81 60 97

Page 7: Linear Probing

Linear Probing : Insertions

• Inserting 51 also causes a collision, this time, in bin 1, therefore, we check:– bin 2 (also full), and– bin 3 (empty)

0 1 2 3 4 5 6 7 8 9

70 81 60 51 97

Page 8: Linear Probing

Linear Probing : Insertions

• 38 and 89 can be placed into bins 8 and 9 respectively without collisions

0 1 2 3 4 5 6 7 8 9

70 81 60 51 97 38 89

Page 9: Linear Probing

Linear Probing : Insertions

• Inserting 68 causes a collision in bin 8, and therefore we check bins:– 9, 0, 1, 2, 3, and finally 4 which is empty– insert 68 into bin 4

0 1 2 3 4 5 6 7 8 9

70 81 60 51 68 97 38 89

Page 10: Linear Probing

Linear Probing : Insertions

• Inserting 24 causes a collision in bin 4, however the next bin is empty

0 1 2 3 4 5 6 7 8 9

70 81 60 51 68 24 97 38 89

Page 11: Linear Probing

Linear Probing: Searching

• Testing for membership is similar to insertions• Start at the appropriate bin, and continue searching

forward until either:– the item is found,– an empty bin is found, or– we have traversed the entire array

• The last case will only occur if the hash table is full

Page 12: Linear Probing

Linear Probing: Searching

• Searching for 68, we first examine bin 8, then 9, 0, 1, 2, 3, and 4, finding 68 in bin 4

• Searching for 23, we search bins 3, 4, 5, and bin 6 is empty, so 23 is not in the table

0 1 2 3 4 5 6 7 8 9

70 81 60 51 68 24 97 38 89

Page 13: Linear Probing

Linear Probing: Removing

• We cannot simply remove elements from the hash table• For example, if we delete 89 by removing it, we can no

longer find 68

0 1 2 3 4 5 6 7 8 9

70 81 60 51 68 24 97 38 89

Page 14: Linear Probing

Linear Probing: Removing

• However, we cannot simply move all entries up to fill the gap

• Moving 70 to bin 9 would make it impossible to find 70

0 1 2 3 4 5 6 7 8 9

70 81 60 51 68 24 97 38 89

81 60 51 68 24 97 38 70

Page 15: Linear Probing

Linear Probing: Removing

• Instead, we must probe forward, moving only those elements which would not be moved to a location before their bin starts

• For example, we remove 89

0 1 2 3 4 5 6 7 8 9

70 81 60 51 68 24 97 38

Page 16: Linear Probing

Linear Probing: Removing

• We probe forward until we find an entry which can be moved into bin 9

• We cannot move 70, 81, 60, or 51, but we can move 68

0 1 2 3 4 5 6 7 8 9

70 81 60 51 24 97 38 68

Page 17: Linear Probing

Linear Probing: Removing

• Next, we search forward again, and note that 24 can be moved forward

• The next cell is already empty, and therefore we are finished

0 1 2 3 4 5 6 7 8 9

70 81 60 51 24 97 38 68

Page 18: Linear Probing

Linear Probing: Removing

• Suppose we now remove 60• Begin searching forward from bin 0

0 1 2 3 4 5 6 7 8 9

70 81 60 51 24 97 38 68

Page 19: Linear Probing

Linear Probing: Removing

• We find 60 in bin 2, and therefore we remove it• We search forward and find that we can move 51 into bin

2

0 1 2 3 4 5 6 7 8 9

70 81 51 24 97 38 68

Page 20: Linear Probing

Linear Probing: Removing

• We cannot move 24 forward• The next bin (5) is empty, therefore we are finished

0 1 2 3 4 5 6 7 8 9

70 81 51 24 97 38 68

Page 21: Linear Probing

Primary Clustering

• We have already observed the following phenomenon:– as we insert more elements into the hash table, the contiguous

regions get larger

• This results in longer search times

Page 22: Linear Probing

Primary Clustering

• Consider inserting the following entries 81, 70, 97, 63, 76, 38, 85, 68, 21, 9, 55, 73, 57, 60, 72, 74, 85, 16, 61, 7, 49

• Use the number modulo 25 to determine which bin it should occupy

• The first four don’t cause any collisions

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

76 81 63 70 97

Page 23: Linear Probing

Primary Clustering

• Inserting 38 causes a collision in bin 13• The next seven do not cause any further collisions

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

76 55 81 57 9 85 63 38 68 70 21 97 73

Page 24: Linear Probing

Primary Clustering

• The next four insertions cause collisions:60 (bin 10)

72 (bin 22)

74 (bin 24)

85 (bin 10)

• We can safely insert 16 into bin 16

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

74 76 55 81 57 9 85 60 85 63 38 16 68 70 21 97 73 72

Page 25: Linear Probing

Primary Clustering

• The remaining insertions all cause collisions:61 (bin 11)

7 (bin 7)

49 (bin 24)

• The joining of smaller groups into one large group is termed coalescing

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

74 76 49 55 81 57 7 9 85 60 85 63 38 61 16 68 70 21 97 73 72

Page 26: Linear Probing

Primary Clustering

• As the load factor increased, the probability of a collision increased

• Justification:– suppose that a chain is of length m– an insertion either into any bin occupied by the chain or into the

locations immediately before or after it will increase the length of the chain

Page 27: Linear Probing

Primary Clustering

• Example, using the last two digits, consider the following hash table

• Any insertion into bins 29 through 34 will increase the length of the chain

... 28 29 30 31 32 33 34 35 ...

230 531 730 432

Page 28: Linear Probing

Primary Clustering

• Consequently, if a chain is of size m, then the probability that it will be increased in length is (m + 2)/M where M is the size of the hash table

• The more a chain grows, the more likely it will grow in the future

Page 29: Linear Probing

Primary Clustering

• The length of these chains will affect the number of probes required to perform insertions, accesses, or removals

• It is possible to estimate the average number of probes for a successful search, where is the load factor:

• For example: if = 0.5, we 1.5 probes

1

112

1

Page 30: Linear Probing

Primary Clustering

• The number of probes for an unsuccessful search or for an insertion is higher:

• For 0 ≤ ≤ 1, then (1 – )2 ≤ 1 – , and therefore the reciprocal will be larger

• Again, if = 0.5 then we require 2.5 probes

221

1

11

Page 31: Linear Probing

Primary Clustering

• The following plot shows how the number of required probes increases

Page 32: Linear Probing

Primary Clustering

• Our goal was to keep all operations O(1)

• Unfortunate, as grows, so does the run time• One solution is to keep the load factor under a given

bound• If we choose = 2/3, then the number of probes for

either a successful or unsuccessful search is 2 and 5, respectively

Page 33: Linear Probing

Primary Clustering

• Therefore, we have three choices:– Choose M large enough so that we will not pass this load factor– Double the number of bins if the chosen load factor is reached– Choose a different strategy from linear probing

Page 34: Linear Probing

Primary Clustering

• The first solution (choose M sufficiently large) is most useful if we know all the possible entries

• The second (doubling) is only useful if we have an environment where we can dynamically allocate memory

• For the third, we will look at quadratic probing and double hashing