24
CSE 326: Data Structures Lecture #14 Whoa… Good Hash, Man Steve Wolfman Winter Quarter 2000

CSE 326: Data Structures Lecture #14 Whoa… Good Hash, Man

  • Upload
    boaz

  • View
    26

  • Download
    0

Embed Size (px)

DESCRIPTION

CSE 326: Data Structures Lecture #14 Whoa… Good Hash, Man. Steve Wolfman Winter Quarter 2000. Today’s Outline. Discuss the midterm Hashing and Hash Tables. Dictionary operations create destroy insert find delete Stores values associated with user-specified keys - PowerPoint PPT Presentation

Citation preview

Page 1: CSE 326: Data Structures Lecture #14 Whoa… Good Hash, Man

CSE 326: Data StructuresLecture #14

Whoa… Good Hash, Man

Steve Wolfman

Winter Quarter 2000

Page 2: CSE 326: Data Structures Lecture #14 Whoa… Good Hash, Man

Today’s Outline

• Discuss the midterm• Hashing and Hash Tables

Page 3: CSE 326: Data Structures Lecture #14 Whoa… Good Hash, Man

Reminder: Dictionary ADT• Dictionary operations

– create– destroy– insert– find– delete

• Stores values associated with user-specified keys– values may be any (homogenous) type– keys may be any (homogenous) comparable type

• Zasha– interesting ID, but not

enough ooomph!

• Bone– More oomph, less high

scoring Scrabble action

• Wolf– the perfect mix of oomph

and Scrabble value

insert

find(Wolf)

• Darth - formidable

• Wolf - the perfect mix of oomph

and Scrabble value

Page 4: CSE 326: Data Structures Lecture #14 Whoa… Good Hash, Man

Implementations So Far

• Unsorted list O(1) O(n) O(n)• Trees O(log n) O(log n) O(log n)

• Special case: O(1) O(1) O(1)

integer keys

between 0 and k

insert deletefind

How about O(1) insert/find/delete for any key type?

Page 5: CSE 326: Data Structures Lecture #14 Whoa… Good Hash, Man

Hash Table Goal

Andrew

We can do:

a[2] = “Andrew”

k-1

3

2

1

0

Andrew

We want to do:

a[“Steve”] = “Andrew”

“Ed”

“Brad”

“Steve”

“Nic”

“Zasha”

Page 6: CSE 326: Data Structures Lecture #14 Whoa… Good Hash, Man

Hash Table Approach

But… is there a problem is this pipe-dream?

f(x)

Zasha

Steve

Nic

Brad

Ed

Page 7: CSE 326: Data Structures Lecture #14 Whoa… Good Hash, Man

Hash Table Dictionary Data Structure

• Hash function: maps keys to integers– result: can quickly find the

right spot for a given entry

• Unordered and sparse table– result: cannot efficiently list

all entries,

f(x)ZashaSteve

NicBrad

Ed

Page 8: CSE 326: Data Structures Lecture #14 Whoa… Good Hash, Man

Hash Table Terminology

f(x)

Zasha

Steve

Nic

Brad

Ed

hash function

collision

keys

load factor = # of entries in table

tableSize

Page 9: CSE 326: Data Structures Lecture #14 Whoa… Good Hash, Man

Hash Table CodeFirst Pass

Value & find(Key & key) { int index = hash(key) % tableSize; return Table[tableSize];}

What should the hash function be?

What should the table size be?

How should we resolve collisions?

Page 10: CSE 326: Data Structures Lecture #14 Whoa… Good Hash, Man

A Good Hash Function…

…is easy (fast) to compute (O(1) and practically fast).…distributes the data evenly (hash(a) % size hash(b) %

size).…uses the whole hash table (for all 0 k < size, there’s an i

such that hash(i) % size = k).

Page 11: CSE 326: Data Structures Lecture #14 Whoa… Good Hash, Man

Good Hash Function for Integers

• Choose – tableSize is prime

– hash(n) = n

• Example:– tableSize = 7

insert(4)

insert(17)

find(12)

insert(9)

delete(17)

3

2

1

0

6

5

4

Page 12: CSE 326: Data Structures Lecture #14 Whoa… Good Hash, Man

Good Hash Function for Strings?

• Let s = s1s2s3s4…s5: choose

– hash(s) = s1 + s2128 + s31282 + s41283 + … + sn128n

• Problems:– hash(“really, really big”) = well… something really, really big

– hash(“one thing”) % 128 = hash(“other thing”) % 128

Think of the string as a base 128 number.

Page 13: CSE 326: Data Structures Lecture #14 Whoa… Good Hash, Man

Making the String HashEasy to Compute

• Use Horner’s Rule

int hash(String s) { h = 0; for (i = s.length() - 1; i >= 0; i--) { h = (si + 128*h) % tableSize; } return h; }

Page 14: CSE 326: Data Structures Lecture #14 Whoa… Good Hash, Man

Making the String HashCause Few Conflicts

• Ideas?

Page 15: CSE 326: Data Structures Lecture #14 Whoa… Good Hash, Man

Good Hashing: Multiplication Method

• Hash function is defined by size plus a parameter AhA(k) = size * (k*A mod 1) where 0 < A < 1

• Example: size = 10, A = 0.485hA(50) = 10 * (50*0.485 mod 1)

= 10 * (24.25 mod 1) = 10 * 0.25 = 2

– no restriction on size!– if we’re building a static table, we can try several As– more computationally intensive than a single mod

Page 16: CSE 326: Data Structures Lecture #14 Whoa… Good Hash, Man

Good Hashing:Universal Hash Function

• Parameterized by prime size and vector:a = <a0 a1 … ar> where 0 <= ai < size

• Represent each key as r + 1 integers where ki < size– size = 11, key = 39752 ==> <3,9,7,5,2>

– size = 29, key = “hello world” ==> <8,5,12,12,15,23,15,18,12,4>

ha(k) = sizekar

iii mod

0

Page 17: CSE 326: Data Structures Lecture #14 Whoa… Good Hash, Man

Universal Hash Function: Example

• Context: hash strings of length 3 in a table of size 131

let a = <35, 100, 21>

ha(“xyz”) = (35*120 + 100*121 + 21*122) % 131

= 129

Page 18: CSE 326: Data Structures Lecture #14 Whoa… Good Hash, Man

Universal Hash Function

• Strengths:– works on any type as long as you can form ki’s

– if we’re building a static table, we can try many a’s

– a random a has guaranteed good properties no matter what we’re hashing

• Weaknesses– must choose prime table size larger than any ki

Page 19: CSE 326: Data Structures Lecture #14 Whoa… Good Hash, Man

Alternate Universal Hash Function

• Parameterized by k, a, and b:– k * size should fit into an int

– a and b must be less than size

Hk,a,b(x) = ksizekbxa /mod

Page 20: CSE 326: Data Structures Lecture #14 Whoa… Good Hash, Man

Alternate Universe Hash Function: Example

• Context: hash integers in a table of size 16

let k = 32, a = 100, b = 200

hk,a,b(1000) = ((100*1000 + 200) % (32*16)) / 32

= (100200 % 512) / 32

= 360 / 32

= 11

Page 21: CSE 326: Data Structures Lecture #14 Whoa… Good Hash, Man

Universal Hash Function

• Strengths:– if we’re building a static table, we can try many a’s

– random a,b has guaranteed good properties no matter what we’re hashing

– can choose any size table

– very efficient if k and size are powers of 2

• Weaknesses– still need to turn non-integer keys into integers

Page 22: CSE 326: Data Structures Lecture #14 Whoa… Good Hash, Man

Collisions

• Pigeonhole principle says we can’t avoid all collisions– try to hash without collision m keys into n slots with m > n

– try to put 10 pigeons into 5 holes

• What do we do when two keys hash to the same entry?– open hashing: put little dictionaries in each entry

– closed hashing: pick a next entry to try

shove extra pigeons in one hole!

Page 23: CSE 326: Data Structures Lecture #14 Whoa… Good Hash, Man

To Do

• Form your team and start Project III• Read chapter 5 in the book

Page 24: CSE 326: Data Structures Lecture #14 Whoa… Good Hash, Man

Coming Up

• More hash tables• Disjoint-set union-find ADT• Fourth Quiz (February 10th)