39
CSC212 Dt St t CSC212 Dt St t Data Structure Data Structure Lecture 20 Lecture 20 Hashing Hashing Instructor: George Instructor: George Wolberg Wolberg Department of Computer Science Department of Computer Science City College of New York City College of New York

CSC212 DtSt tData Structure - City University of New Yorkwolberg/cs212/pdf/CSc212-20-Hashing.pdf · CSC212 DtSt tData Structure Lecture 20 ... and Other Objects Using C++. ... Students

  • Upload
    dinhnhu

  • View
    215

  • Download
    1

Embed Size (px)

Citation preview

CSC212 D t St tCSC212 D t St tData Structure Data Structure

Lecture 20Lecture 20Hashing Hashing

Instructor: George Instructor: George WolbergWolbergDepartment of Computer Science Department of Computer Science

City College of New YorkCity College of New York

Hash TablesHash TablesHash TablesHash Tables

Chapter 12 discusses several ways of storing information in an array, and later searching for the informationinformation.

Hash tables are a common approach to theapproach to the storing/searching problem.

This presentation introducesData StructuresData Structures This presentation introduces

hash tables.and Other Objectsand Other ObjectsUsing C++Using C++

What is a Hash Table ?What is a Hash Table ?What is a Hash Table ?What is a Hash Table ?

The simplest kind of hash table is an array of records.

This example has 701 records.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]

An array of records

. . .

An array of records

What is a Hash Table ?What is a Hash Table ? [ 4 ]What is a Hash Table ?What is a Hash Table ?

E h d h i l

Number 506643548

Each record has a special field, called its key.

I thi l th k i In this example, the key is a long integer field called NumberNumber.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]

. . .

What is a Hash Table ?What is a Hash Table ?[ 4 ]

What is a Hash Table ?What is a Hash Table ?

Th b i ht b

Number 506643548

The number might be a person's identification number and the rest of thenumber, and the rest of the record has information about the person.p

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]

. . .

What is a Hash Table ?What is a Hash Table ?What is a Hash Table ?What is a Hash Table ?

When a hash table is in use, some spots contain valid records, and other spots are "empty".

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .

Inserting a New RecordInserting a New RecordInserting a New RecordInserting a New RecordNumber 580625685

In order to insert a new record, the key must somehow be converted to an array index.

h i d i ll d h The index is called the hash value of the key.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .

Inserting a New RecordInserting a New RecordInserting a New RecordInserting a New RecordNumber 580625685

Typical way create a hash value:(Number mod 701)

Wh t i (580625685 d 701) ?What is (580625685 mod 701) ?

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .

Inserting a New RecordInserting a New RecordInserting a New RecordInserting a New RecordNumber 580625685

Typical way to create a hash value:(Number mod 701)

Wh t i (580625685 d 701) ?3

What is (580625685 mod 701) ?

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .

Inserting a New RecordInserting a New RecordInserting a New RecordInserting a New Record

Th h h l i d f

Number 580625685

The hash value is used for the location of the new recordrecord.

[3][ ]

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .

Inserting a New RecordInserting a New RecordInserting a New RecordInserting a New Record

Th h h l i d f The hash value is used for the location of the new recordrecord.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685

CollisionsCollisionsCollisionsCollisions

H i th d

Number 701466868

Here is another new record to insert, with a hash value of 2of 2.

My hashvalue is [2].value is [2].

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685

CollisionsCollisionsCollisionsCollisions

H i th d

Number 701466868

Here is another new record to insert, with a hash value of 2of 2.

My hashvalue is [2].When a collision occurs, value is [2].

move forward until youfind an empty spot.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685

CollisionsCollisionsCollisionsCollisions

Thi i ll d lli i

Number 701466868

This is called a collision, because there is already another valid record at [2]another valid record at [2].

When a collision occurs,move forward until you

find an empty spot.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685

CollisionsCollisionsCollisionsCollisions

Thi i ll d lli i

Number 701466868

This is called a collision, because there is already another valid record at [2]another valid record at [2].

When a collision occurs,move forward until you

find an empty spot.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685

CollisionsCollisionsCollisionsCollisions

Thi i ll d lli i

Number 701466868

This is called a collision, because there is already another valid record at [2]another valid record at [2].

When a collision occurs,move forward until you

find an empty spot.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685

CollisionsCollisionsCollisionsCollisions

Thi i ll d lli i This is called a collision, because there is already another valid record at [2]another valid record at [2].

The new record goesThe new record goesin the empty spot.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

A QuizA QuizA QuizA Quiz

Where would you be placed in this table, if there is no collision? Use your social security number or some

h f i bother favorite number.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322Number 580625685 Number 701466868

. . .

Another Kind of CollisionAnother Kind of CollisionAnother Kind of CollisionAnother Kind of CollisionNumber 155779023

Where would you be placed in this table, if there is no collision? Use your social security number or some

h f i b

My hashvalue is [700].other favorite number. value is [700].

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322Number 580625685 Number 701466868

. . .

Another Kind of CollisionAnother Kind of CollisionAnother Kind of CollisionAnother Kind of CollisionNumber 155779023

Where would you be placed in this table, if there is no collision? Use your social security number or some

h f i b

My hashvalue is [700].other favorite number. value is [700].

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322Number 580625685 Number 701466868

. . .

Searching for a KeySearching for a KeySearching for a KeySearching for a Key

Th d t th t' tt h d t

Number 701466868

The data that's attached to a key can be found fairly quicklyquickly.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

Searching for a KeySearching for a KeySearching for a KeySearching for a Key

C l l t th h h l

Number 701466868

Calculate the hash value. Check that location of the array

f th kfor the key.My hash

value is [2].value is [2].

Not me.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

Searching for a KeySearching for a KeySearching for a KeySearching for a Key

K i f d til

Number 701466868

Keep moving forward until you find the key, or you reach an empty spotempty spot.

My hashvalue is [2].value is [2].

Not me.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

Searching for a KeySearching for a KeySearching for a KeySearching for a Key

K i f d til

Number 701466868

Keep moving forward until you find the key, or you reach an empty spotempty spot.

My hashvalue is [2].value is [2].

Not me.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

Searching for a KeySearching for a KeySearching for a KeySearching for a Key

K i f d til

Number 701466868

Keep moving forward until you find the key, or you reach an empty spotempty spot.

My hashvalue is [2].value is [2].

Yes!

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

Searching for a KeySearching for a KeySearching for a KeySearching for a Key

Wh th it i f d th

Number 701466868

When the item is found, the information can be copied to the necessary locationthe necessary location.

My hashvalue is [2].value is [2].

Yes!

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

Deleting a RecordDeleting a RecordDeleting a RecordDeleting a Record

R d l b d l t d f h h t bl Records may also be deleted from a hash table.

Pleasedelete me.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

Deleting a RecordDeleting a RecordDeleting a RecordDeleting a Record

R d l b d l t d f h h t bl Records may also be deleted from a hash table. But the location must not be left as an ordinary

" t t" i th t ld i t f ith h"empty spot" since that could interfere with searches.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

Deleting a RecordDeleting a RecordDeleting a RecordDeleting a Record

R d l b d l t d f h h t bl Records may also be deleted from a hash table. But the location must not be left as an ordinary

" t t" i th t ld i t f ith h"empty spot" since that could interfere with searches. The location must be marked in some special way so

that a search can tell that the spot used to havethat a search can tell that the spot used to have something in it.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

Time AnalysisTime AnalysisTime AnalysisTime Analysis

Without any collisionsWithout any collisions constantconstant constantconstant

With collisionsWith collisionsWith collisionsWith collisionsO(k) where k is the average collisions for itemsO(k) where k is the average collisions for itemsk << i f th blk << i f th blk << n, size of the problemk << n, size of the problem

Improving HashingImproving HashingImproving HashingImproving Hashing

Size of the hashing table when using Size of the hashing table when using division hash division hash functionfunction

i b i h f f 4k 3i b i h f f 4k 3 prime number in the form of 4k+3prime number in the form of 4k+3 Other hashing functionsOther hashing functions

midmid--square multiplicativesquare multiplicative midmid--square, multiplicativesquare, multiplicative Double hashing (instead of linear probing)Double hashing (instead of linear probing)

the 2the 2ndnd hash function for stepping through the arrayhash function for stepping through the arraypp g g ypp g g y Chained hashingChained hashing

using a linked list for each component of the hash table using a linked list for each component of the hash table

SummarySummarySummarySummary

Hash tables store a collection of records with keys. The location of a record depends on the hash value p

of the record's key. When a collision occurs, the next available

location is used. Searching for a particular key is generally quick. When an item is deleted, the location must be

marked in a special way, so that the searches know that the spot used to be used.

Hash Table Exercise

Fi d f t t d tFive records of my past students

- Create a small hash table with size 5 (indexes 0 to 4).

- Insert the five items

- Remove Bill Clinton

Do three searches (for Will Smith Bill Clinton and- Do three searches (for Will Smith, Bill Clinton, and Elizabeth).

Kathy MartinKathy Martin817339024

Took Data Structures in Fall 1993.Grade A.

Hard worker. Always gets things doneon time.

Currently working for ABCin New York City.

Will SmithWill Smith506643973

Took Data Structures in Fall 1995.Grade A.

A bit of a goof-off, but he comes throughin a pinch.

Currently saving the world from alieninvasion.

William “Bill” ClintonWilliam Bill Clinton330220393

Took Data Structures in Fall 1995.Grade B-.

Gets along with mostpeople well.

Been laid off even before the slowdown of the economy.

Elizabeth WindsorElizabeth Windsor092223340

Took Data Structures in Fall 1995.Grade B-.

Prefers to be called “Elizabeth II” or “HerMajesty.” Has some family problems.

Currently working in public relationsnear London.

Al EinsteinAl Einstein699200102

Took CSCI 2270 in Fall 1995.Grade F.

In spite of poor grade, I think there isgood academic ability in Al.

Currently a well-known advocate forpeace.

Presentation copyright 1997 Addison Wesley Longman,For use with Data Structures and Other Objects Using C++by Michael Main and Walter Savitch.

Some artwork in the presentation is used with permission from Presentation Task Force(copyright New Vision Technologies Inc) and Corel Gallery Clipart Catalog (copyrightCorel Corporation, 3G Graphics Inc, Archive Arts, Cartesia Software, Image ClubGraphics Inc, One Mile Up Inc, TechPool Studios, Totem Graphics Inc).

Students and instructors who use Data Structures and Other Objects Using C++ are welcometo use this presentation however they see fit, so long as this copyright notice remainsintact.

THE ENDTHE ENDTHE ENDTHE END