Linear Probing

Open Addressing:Linear Probing

Data Structures and Algorithms

Linear Probing

• The easiest method to probe the bins of the hash table is to search forward linearly

• Assume we are inserting into bin i:– if bin i is empty, we occupy it– otherwise, check bin i + 1, i + 2, and so on, until an empty bin is

found– if we reach the end of the hash table, we start at the front (bin 0)

Linear Probing

• For example, suppose that our hash function converts a 2-digit integer into a single digit by taking the least-significant digit

• Not necessarily a bad idea: for most natural data, the least-significant digit is approximately random

• Such distributions are used to catch tax fraud

Linear Probing: Insertions

• Insert the numbers

81, 70, 97, 60, 51, 38, 89, 68, 24

into the initially empty hash table:

0 1 2 3 4 5 6 7 8 9

Linear Probing : Insertions

• We can easily insert 81, 70, and 97 into their corresponding bins:

0 1 2 3 4 5 6 7 8 9

70 81 97


• Inserting 60 causes a collision in bin 0, therefore, we check:– bin 1 (also full), and– bin 2 (empty)

0 1 2 3 4 5 6 7 8 9

70 81 60 97


• Inserting 51 also causes a collision, this time, in bin 1, therefore, we check:– bin 2 (also full), and– bin 3 (empty)

0 1 2 3 4 5 6 7 8 9

70 81 60 51 97


• 38 and 89 can be placed into bins 8 and 9 respectively without collisions

0 1 2 3 4 5 6 7 8 9

70 81 60 51 97 38 89


• Inserting 68 causes a collision in bin 8, and therefore we check bins:– 9, 0, 1, 2, 3, and finally 4 which is empty– insert 68 into bin 4

0 1 2 3 4 5 6 7 8 9

70 81 60 51 68 97 38 89


• Inserting 24 causes a collision in bin 4, however the next bin is empty

0 1 2 3 4 5 6 7 8 9

70 81 60 51 68 24 97 38 89

Linear Probing: Searching

• Testing for membership is similar to insertions• Start at the appropriate bin, and continue searching

forward until either:– the item is found,– an empty bin is found, or– we have traversed the entire array

• The last case will only occur if the hash table is full

Linear Probing: Searching

• Searching for 68, we first examine bin 8, then 9, 0, 1, 2, 3, and 4, finding 68 in bin 4

• Searching for 23, we search bins 3, 4, 5, and bin 6 is empty, so 23 is not in the table

0 1 2 3 4 5 6 7 8 9

70 81 60 51 68 24 97 38 89

Linear Probing: Removing

• We cannot simply remove elements from the hash table• For example, if we delete 89 by removing it, we can no

longer find 68

0 1 2 3 4 5 6 7 8 9

70 81 60 51 68 24 97 38 89


• However, we cannot simply move all entries up to fill the gap

• Moving 70 to bin 9 would make it impossible to find 70

0 1 2 3 4 5 6 7 8 9

70 81 60 51 68 24 97 38 89

81 60 51 68 24 97 38 70


• Instead, we must probe forward, moving only those elements which would not be moved to a location before their bin starts

• For example, we remove 89

0 1 2 3 4 5 6 7 8 9

70 81 60 51 68 24 97 38


• We probe forward until we find an entry which can be moved into bin 9

• We cannot move 70, 81, 60, or 51, but we can move 68

0 1 2 3 4 5 6 7 8 9

70 81 60 51 24 97 38 68


• Next, we search forward again, and note that 24 can be moved forward

• The next cell is already empty, and therefore we are finished

0 1 2 3 4 5 6 7 8 9

70 81 60 51 24 97 38 68


• Suppose we now remove 60• Begin searching forward from bin 0

0 1 2 3 4 5 6 7 8 9

70 81 60 51 24 97 38 68


• We find 60 in bin 2, and therefore we remove it• We search forward and find that we can move 51 into bin

2

0 1 2 3 4 5 6 7 8 9

70 81 51 24 97 38 68


• We cannot move 24 forward• The next bin (5) is empty, therefore we are finished

0 1 2 3 4 5 6 7 8 9

70 81 51 24 97 38 68

Primary Clustering

• We have already observed the following phenomenon:– as we insert more elements into the hash table, the contiguous

regions get larger

• This results in longer search times

Primary Clustering

• Consider inserting the following entries 81, 70, 97, 63, 76, 38, 85, 68, 21, 9, 55, 73, 57, 60, 72, 74, 85, 16, 61, 7, 49

• Use the number modulo 25 to determine which bin it should occupy

• The first four don’t cause any collisions

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

76 81 63 70 97

Primary Clustering

• Inserting 38 causes a collision in bin 13• The next seven do not cause any further collisions

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

76 55 81 57 9 85 63 38 68 70 21 97 73

Primary Clustering

• The next four insertions cause collisions:60 (bin 10)

72 (bin 22)

74 (bin 24)

85 (bin 10)

• We can safely insert 16 into bin 16

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

74 76 55 81 57 9 85 60 85 63 38 16 68 70 21 97 73 72

Primary Clustering

• The remaining insertions all cause collisions:61 (bin 11)

7 (bin 7)

49 (bin 24)

• The joining of smaller groups into one large group is termed coalescing

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

74 76 49 55 81 57 7 9 85 60 85 63 38 61 16 68 70 21 97 73 72

Primary Clustering

• As the load factor increased, the probability of a collision increased

• Justification:– suppose that a chain is of length m– an insertion either into any bin occupied by the chain or into the

locations immediately before or after it will increase the length of the chain

Primary Clustering

• Example, using the last two digits, consider the following hash table

• Any insertion into bins 29 through 34 will increase the length of the chain

... 28 29 30 31 32 33 34 35 ...

230 531 730 432

Primary Clustering

• Consequently, if a chain is of size m, then the probability that it will be increased in length is (m + 2)/M where M is the size of the hash table

• The more a chain grows, the more likely it will grow in the future

Primary Clustering

• The length of these chains will affect the number of probes required to perform insertions, accesses, or removals

• It is possible to estimate the average number of probes for a successful search, where is the load factor:

• For example: if = 0.5, we 1.5 probes

1

112

1

Primary Clustering

• The number of probes for an unsuccessful search or for an insertion is higher:

• For 0 ≤ ≤ 1, then (1 – )2 ≤ 1 – , and therefore the reciprocal will be larger

• Again, if = 0.5 then we require 2.5 probes

221

1

11

Primary Clustering

• The following plot shows how the number of required probes increases

Primary Clustering

• Our goal was to keep all operations O(1)

• Unfortunate, as grows, so does the run time• One solution is to keep the load factor under a given

bound• If we choose = 2/3, then the number of probes for

either a successful or unsuccessful search is 2 and 5, respectively

Primary Clustering

• Therefore, we have three choices:– Choose M large enough so that we will not pass this load factor– Double the number of bins if the chosen load factor is reached– Choose a different strategy from linear probing

Primary Clustering

• The first solution (choose M sufficiently large) is most useful if we know all the possible entries

• The second (doubling) is only useful if we have an environment where we can dynamically allocate memory

• For the third, we will look at quadratic probing and double hashing

Documents

Linear Probing