25
CS 310: Hash Table Collision Resolution Chris Kauffman Week 8-1

CS 310: Hash Table Collision Resolutionkauffman/cs310/08-hashing-collision.pdfLogistics Reading I WeissCh20: HashTable I WeissCh6.7-8: Maps/Sets Homework I HW1DueSaturday I DiscussHW2nextweek

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CS 310: Hash Table Collision Resolutionkauffman/cs310/08-hashing-collision.pdfLogistics Reading I WeissCh20: HashTable I WeissCh6.7-8: Maps/Sets Homework I HW1DueSaturday I DiscussHW2nextweek

CS 310: Hash Table Collision Resolution

Chris Kauffman

Week 8-1

Page 2: CS 310: Hash Table Collision Resolutionkauffman/cs310/08-hashing-collision.pdfLogistics Reading I WeissCh20: HashTable I WeissCh6.7-8: Maps/Sets Homework I HW1DueSaturday I DiscussHW2nextweek

Logistics

Reading

I Weiss Ch 20: Hash TableI Weiss Ch 6.7-8: Maps/Sets

HomeworkI HW 1 Due SaturdayI Discuss HW 2 next weekI Questions?

ScheduleI Course Schedule updatedI Midterm: Next week

Thursday

Goals Today

I Hash CodesI Resolving hash table

collisions

Page 3: CS 310: Hash Table Collision Resolutionkauffman/cs310/08-hashing-collision.pdfLogistics Reading I WeissCh20: HashTable I WeissCh6.7-8: Maps/Sets Homework I HW1DueSaturday I DiscussHW2nextweek

Hash Code and Hash Tables

Hash CodesI Class X wants to be put

inside hash tables (hashable)I Must override the

hashCode() and equals(x)methods

I hashCode() should obey thehash contract (equal x’s →equal hash codes)

I hashCode() should try toproduce different hash codesfor unequal x’s

I HashSet will usex.hashCode() andx.equals(y) to trackunique instances of X

Basic HashSet Implementation

class HashSet<T>{T hta[]; int size;boolean contains(T x){

int xhc = x.hashCode();// If xhc out of bounds?xhc = ???;// Is this okay?return

x.equals(this.hta[xhc]);}void add(T x){

int xhc = x.hashCode();// If xhc out of bounds?xhc = ???;// Is this okay?this.hta[xhc] = x;this.size++;

}}

Page 4: CS 310: Hash Table Collision Resolutionkauffman/cs310/08-hashing-collision.pdfLogistics Reading I WeissCh20: HashTable I WeissCh6.7-8: Maps/Sets Homework I HW1DueSaturday I DiscussHW2nextweek

Getting Hash Codes in Bounds for Hash Tables

I hta[] has a fixed sizeI The hash code xhc

can be any integerI Take an absolute

value of xhc ifnegative

I Use modulo to getxhc in bounds

int n = hta.length;hta[abs(xhc) % n] = x;

Note: For mathy reasonswe’ll briefly discuss, usuallymake hash table size n aprime number

Page 5: CS 310: Hash Table Collision Resolutionkauffman/cs310/08-hashing-collision.pdfLogistics Reading I WeissCh20: HashTable I WeissCh6.7-8: Maps/Sets Homework I HW1DueSaturday I DiscussHW2nextweek

Pragmatic Collision Resolution: Separate Chaining

MotivationI Put x in table at hta[xhc]I Problem: What if hta[xhc] is occupied?

Separate ChainingMost of you recognize this problem can be solved simply

I Internal array contains listsI Add x to the list at hta[xhc]

public class HashTable<T>{private List<T> hta[];...

Page 6: CS 310: Hash Table Collision Resolutionkauffman/cs310/08-hashing-collision.pdfLogistics Reading I WeissCh20: HashTable I WeissCh6.7-8: Maps/Sets Homework I HW1DueSaturday I DiscussHW2nextweek

Separate Chaining: Example

Code

String [] sa1 = new String[]{"Chris","Sam","Beth","Dan"

};

SeparateChainHS<String> h =new SeparateChainHS<String>(11);

for(String s : sa1){h.add(s);

}print(h.load());// load = 4 / 11// 0.36363636363636365

load =item countarray length

Load = 0.36

Page 7: CS 310: Hash Table Collision Resolutionkauffman/cs310/08-hashing-collision.pdfLogistics Reading I WeissCh20: HashTable I WeissCh6.7-8: Maps/Sets Homework I HW1DueSaturday I DiscussHW2nextweek

Separate Chaining: ExampleCode

String [] sa2 = new String[]{"Chris","Sam","Beth","Dan","George","Kevin","Nikil","Mark","Dana","Amy","Foo","Spike","Jet","Ed"

};

SeparateChainHS<String> h =new SeparateChainHS<String>(11);

for(String s : sa2){h.add(s);

}

h.load();// load = 14 / 11// 1.2727272727272727

Load = 1.27

Page 8: CS 310: Hash Table Collision Resolutionkauffman/cs310/08-hashing-collision.pdfLogistics Reading I WeissCh20: HashTable I WeissCh6.7-8: Maps/Sets Homework I HW1DueSaturday I DiscussHW2nextweek

Exercise: Implement Separate ChainingI A Set has at most one copy of any element (no duplicates)I Write add/remove/contains for SeparateChainingHSI What are the time complexities of each method?

public class SeparateChainingHS<T>{private List<T> hta[];private int itemCount;

// Constructor, n is initial size of hta[]public SeparateChainingHS(int n){

this.itemCount = 0;this.hta = new List<T>[n];for(int i=0; i<n; i++){

this.hta[i]=new LinkedList<T>();}

}

public void add(T x); // Add x if not alread presentpublic void remove(T x); // Remove x if presentpublic boolean contains(T x); // Return true if x present, false o/w

}

Page 9: CS 310: Hash Table Collision Resolutionkauffman/cs310/08-hashing-collision.pdfLogistics Reading I WeissCh20: HashTable I WeissCh6.7-8: Maps/Sets Homework I HW1DueSaturday I DiscussHW2nextweek

Complexity of Separate Chaining Methods

// Add x to the HashSet if not already presentpublic void add(T x){

int xhc = x.hashCode();int pos = Math.abs(xhc) % this.hta.length;List<T> l = this.hta[ pos ];for(T y : l){ // does l contain x?

if(x.equals(y)){return; // x already present

}}l.add(x); // x not present, add

}

remove()/contains() follow a similar flow.

I Assume l.add(x) is O(1)I Assume fair hash function (distributes well)I Load is the average number of things in each list in the arrayI add/remove/contains must look through Load elements on average

O(Load) = O(itemCount/arraySize)

Page 10: CS 310: Hash Table Collision Resolutionkauffman/cs310/08-hashing-collision.pdfLogistics Reading I WeissCh20: HashTable I WeissCh6.7-8: Maps/Sets Homework I HW1DueSaturday I DiscussHW2nextweek

Rehashing: Lower the Load

I Complexity of hash table methods dependent on the Load

O(Load) = O(itemCount/arraySize)

I Can decrease Load by increasing arraySizeI Allocate a new larger array hta[], prime number sizeI Copy over all active items to the new arrayI Important: determine new locations of each item in the table

because hta.length will change inint pos = abs(x.hashCode()) % hta.length;

I O(N) to rehash where N is the size of the new array

Page 11: CS 310: Hash Table Collision Resolutionkauffman/cs310/08-hashing-collision.pdfLogistics Reading I WeissCh20: HashTable I WeissCh6.7-8: Maps/Sets Homework I HW1DueSaturday I DiscussHW2nextweek

Separate Chaining Viable in Practice

Java’s built-in hash tables use Separate Chaining

I Simple to code, Reasonably efficientI java.util.HashSet / HashMap / Hashtable

all use separate chaining

Weiss’s implementation

I Code shown in Weiss pg 799I Rolled own linked listI No remove (write it yourself)I Part of code distribution

Page 12: CS 310: Hash Table Collision Resolutionkauffman/cs310/08-hashing-collision.pdfLogistics Reading I WeissCh20: HashTable I WeissCh6.7-8: Maps/Sets Homework I HW1DueSaturday I DiscussHW2nextweek

Alternatives to Separate Chaining

Separate Chaining works well but has some disadvantages

I Requires separate data structure (lists)I Involves additional levels of indirection: elements are more

than one memory reference away from the hash table arrayI Adding requires memory allocation for nodes/lists

Alternative: Open Address Hashing

I Store element references directly in hash table arrayI No auxiliary of lists/arrays references from the hash tableI Why do it this way?I How can we handle collisions now?

Page 13: CS 310: Hash Table Collision Resolutionkauffman/cs310/08-hashing-collision.pdfLogistics Reading I WeissCh20: HashTable I WeissCh6.7-8: Maps/Sets Homework I HW1DueSaturday I DiscussHW2nextweek

Open AddressingBasic Design

I Hash table elements stored in array hta (no auxilliary lists)I Probe a sequence of array positions for object

# Generic pseudocode for a probe sequencepos = abs(x.hashCode()) % hta.length;repeat

if hta[pos] is emptyhta[pos] = xreturn

elsepos = someplace else

Design Issues

I Obvious next places to look after pos?I How to indicate an entry is empty?I Limits?

Page 14: CS 310: Hash Table Collision Resolutionkauffman/cs310/08-hashing-collision.pdfLogistics Reading I WeissCh20: HashTable I WeissCh6.7-8: Maps/Sets Homework I HW1DueSaturday I DiscussHW2nextweek

Linear ProbingStart with normal insertion position pos

int pos = Math.abs(x.hashCode()) % hta.length;

Try the following sequence until an empty array element is found

pos, pos+1, pos+2, pos+3, ... pos+i

Process of add(x) in hash table

// General idea of linear probing sequencepos = Math.abs(x.hashCode() % hta.length);if hta[pos] empty, put x thereelse if hta[(pos+1)] empty, put x thereelse if hta[(pos+2)] empty, put x there...

Exercise: Write java code for linear probing

// Insert x using linear probe sequencepublic void add(T x)

Page 15: CS 310: Hash Table Collision Resolutionkauffman/cs310/08-hashing-collision.pdfLogistics Reading I WeissCh20: HashTable I WeissCh6.7-8: Maps/Sets Homework I HW1DueSaturday I DiscussHW2nextweek

Consequences of Open Address Hashing

With linear probingI Can add(x) fail? Under what conditions?I Code for contains(x)?I How does remove(x) work?

Page 16: CS 310: Hash Table Collision Resolutionkauffman/cs310/08-hashing-collision.pdfLogistics Reading I WeissCh20: HashTable I WeissCh6.7-8: Maps/Sets Homework I HW1DueSaturday I DiscussHW2nextweek

Removal in Open Addressing: Follow Chain

|------+------+-----+-------|| Item | Code | Pos | Added ||------+------+-----+-------|| A | 5 | 5 | 1 || B | 6 | 6 | 2 || C | 5 | 7 | 3 || D | 7 | 8 | 4 || E | 5 | 9 | 5 || F | 8 | 10 | 6 || G | 11 | 11 | 7 || H | 12 | 12 | 8 || I | 9 | 13 | 9 ||------+------+-----+-------|

I Suppose remove(X) sets position to nullI What are the booleans assigned to?

h.remove(A); boolean b1 = h.contains(C);h.remove(D); boolean b2 = h.contains(F);h.remove(E); boolean b3 = h.contains(I);

Page 17: CS 310: Hash Table Collision Resolutionkauffman/cs310/08-hashing-collision.pdfLogistics Reading I WeissCh20: HashTable I WeissCh6.7-8: Maps/Sets Homework I HW1DueSaturday I DiscussHW2nextweek

Avoid Breaking Chains in RemovalI Don’t set removed records to nullI Use place-holders, in Weiss it’s HashSet.HashEntry

private static class HashEntry {public Object element; // the elementpublic boolean isActive; // false if marked deletedpublic HashEntry( Object e ) {

this( e, true );}public HashEntry( Object e, boolean i ){

element = e;isActive = i;

}}

Explore weiss/code/HashSet.javaI remove(x) sets isActive to falseI contains(x) treats slot as filledI rehash() ignores inactive entries

Page 18: CS 310: Hash Table Collision Resolutionkauffman/cs310/08-hashing-collision.pdfLogistics Reading I WeissCh20: HashTable I WeissCh6.7-8: Maps/Sets Homework I HW1DueSaturday I DiscussHW2nextweek

Load and Linear ProbingLoad has a big effect on performance in linear probing

I When Inserting xI If h[cx] full, cx++ and repeatI When h is nearly full, scan most of arrayI load ≈ 1→ O(n) for add(x)/contains(x)

TheoremThe average number of cellsexamined during insertion withlinear probing is

12

(1+

1(1− load)2

)Where,

load =item countarray length

0.0 0.2 0.4 0.6 0.8 1.0

15

5050

050

00

load

buck

ets

chec

ked

on in

sert

Page 19: CS 310: Hash Table Collision Resolutionkauffman/cs310/08-hashing-collision.pdfLogistics Reading I WeissCh20: HashTable I WeissCh6.7-8: Maps/Sets Homework I HW1DueSaturday I DiscussHW2nextweek

Why does this happen?

Primary ClusteringMany keys group together, clustersdegrade performance

I Table size 20I Filled cells 5-10, 12I Insert H hashes to 6

I Must put at 11I Insert I hashes to 10

I Must put at 13

I Hashes from 5-13 have clustered

Page 20: CS 310: Hash Table Collision Resolutionkauffman/cs310/08-hashing-collision.pdfLogistics Reading I WeissCh20: HashTable I WeissCh6.7-8: Maps/Sets Homework I HW1DueSaturday I DiscussHW2nextweek

Quadratic Probing

Try the following sequence until an empty array element is found

pos, pos+1^2, pos+2^2, pos+3^2, ... pos+i^2

I Primary clustering fixed: not putting in adjacent cellsI add works up to load ≤ 0.5 (Weiss Theorem 20.4, pg 786)I For load > 0.5 may cycle infinitely over occupied entries (bad!)I Can be done efficiently (Weiss pg 787)I Complexity Not fully understood

I No known relation of load to average cells searchedI Interesting open research problem

Page 21: CS 310: Hash Table Collision Resolutionkauffman/cs310/08-hashing-collision.pdfLogistics Reading I WeissCh20: HashTable I WeissCh6.7-8: Maps/Sets Homework I HW1DueSaturday I DiscussHW2nextweek

Probe Sequence Differences> Math.abs("Marylee".hashCode()) % 115

Linear Probe Quadratic Probe

> Math.abs("Barb".hashCode()) % 115 --> Where?

Page 22: CS 310: Hash Table Collision Resolutionkauffman/cs310/08-hashing-collision.pdfLogistics Reading I WeissCh20: HashTable I WeissCh6.7-8: Maps/Sets Homework I HW1DueSaturday I DiscussHW2nextweek

Hash Tables in Java

java.util.HashMap Map built from hashingjava.util.HashSet Set built from hashingjava.util.Hashtable Map built from hashing, earlier class,

synchronized for multithread apps

Page 23: CS 310: Hash Table Collision Resolutionkauffman/cs310/08-hashing-collision.pdfLogistics Reading I WeissCh20: HashTable I WeissCh6.7-8: Maps/Sets Homework I HW1DueSaturday I DiscussHW2nextweek

Hash Take-Home

I Provide O(1) add/remove/containsI Separate chaining is a pragmatic solution

I Hash buckets have listsI Open Address Hashing

I Look in a sequence of buckets for an objectI Linear probing is one way to do open address hashing

I Simple to implement: look in adjacent bucketsI Performance suffers load approaches 1I Primary clustering hurts performance

I Quadratic probing is another way to do open address hashingI Prevents primary clusteringI Must keep hash half-empty to guarantee successful addI Not fully understood mathematically

Page 24: CS 310: Hash Table Collision Resolutionkauffman/cs310/08-hashing-collision.pdfLogistics Reading I WeissCh20: HashTable I WeissCh6.7-8: Maps/Sets Homework I HW1DueSaturday I DiscussHW2nextweek

Hash Tables are another Container

ContainersI Like arrays, linked lists, trees, hash tablesI Have add(x), remove(x), contains(x) methods

add(x) put x in the DSremoveLast() get rid of "last" itemremove(x) take x out of DS

contains(x) is x in DS?

Speed Comparisons

I Speeds for array or ArrayList?I Speeds for LinkedList?I Speeds for hash table?

Page 25: CS 310: Hash Table Collision Resolutionkauffman/cs310/08-hashing-collision.pdfLogistics Reading I WeissCh20: HashTable I WeissCh6.7-8: Maps/Sets Homework I HW1DueSaturday I DiscussHW2nextweek

Operation Complexities (Speed)

I add(x): put x in the DSI removeLast(): get rid of "last" itemI remove(x): take x out of DSI contains(x): is x in DS?

| | add(x) | removeLast() | remove(x) | contains(x) ||--------------+----------+--------------+-----------+-------------|| ArrayList | O(1) | O(1) | O(n) | O(n) || LinkedList | O(1) | O(1) | O(n) | O(n) || Hash Table | O(1) | X | O(1) | O(1) |

This table is slightly misleadingI Careful of semantics of each operationI Presence/lack of sorting propertyI Set/Map distinctionsI What about space complexity of each?