33
1 Chapter 11 Chapter 11 Hash Tables Hash Tables Part 1

1 Chapter 11 Hash Tables Part 1. 2 / 48 Introduction Very VERY fast way to build and access tables (and records in files too). Provide nearly O(1) performance

Embed Size (px)

Citation preview

Page 1: 1 Chapter 11 Hash Tables Part 1. 2 / 48 Introduction Very VERY fast way to build and access tables (and records in files too). Provide nearly O(1) performance

11

Chapter 11 Chapter 11 Hash TablesHash Tables

Part 1

Page 2: 1 Chapter 11 Hash Tables Part 1. 2 / 48 Introduction Very VERY fast way to build and access tables (and records in files too). Provide nearly O(1) performance

22//4848

Introduction Very VERY fast way to build and

access tables (and records in files too).

Provide nearly O(1) performance. Simple to do too. Extremely fast; significantly faster

than trees, which are O(log2n)!

Page 3: 1 Chapter 11 Hash Tables Part 1. 2 / 48 Introduction Very VERY fast way to build and access tables (and records in files too). Provide nearly O(1) performance

33//4848

More…More…

Similar to arrays – which is a disadvantage.

Do need to have some idea how many items will populate the hash table so table size can be determines.

Also quite difficult (makes no sense) to access sequentially (as you will see).

But you will know the size of the table and hence the size of your ‘database.’

Page 4: 1 Chapter 11 Hash Tables Part 1. 2 / 48 Introduction Very VERY fast way to build and access tables (and records in files too). Provide nearly O(1) performance

44//4848

Introduction to HashingIntroduction to Hashing Often called Randomizing… We will have a range of (input) values (e.g.

state names, account numbers, String value, etc.) that will be mapped into a range of array values.

The function that will map the key values of these records, objects, etc. to table locations (or relative file locations) is called a hash function.

So, items (objects, records, attribute, etc.) will ‘hash’ to locations in the hash table based on a value within a record / object.

Then, the record / object will be moved into and occupy those locations.

Page 5: 1 Chapter 11 Hash Tables Part 1. 2 / 48 Introduction Very VERY fast way to build and access tables (and records in files too). Provide nearly O(1) performance

55//4848

Introduction to Hashing - Introduction to Hashing - ExampleExample

Let’s take objects of type Person that have an SSAN attribute (field) in it.

We may take the last four digits, like 1234, and divide (mod) that by some prime number, say 47, such that ALL objects will hash to values between, say, 0 and 46.

So, given an input object, we will access this key value, hash to an index between 0 and 46 and then move the entire object (or whatever it is…) into this location (indexed by the hashed-to value (0 to 46).

Page 6: 1 Chapter 11 Hash Tables Part 1. 2 / 48 Introduction Very VERY fast way to build and access tables (and records in files too). Provide nearly O(1) performance

66//4848

Hashing: Exception CasesHashing: Exception Cases Existing Key: Sometimes there is an attribute

within an object that can be directly used as an index into the hash table without invoking a hashing function.

Examples: e.g. (from your book) employee number in an object or record, or, in a laboratory, an ‘experiment’ number attribute, which might start with a lab number or experiment number 0 or 1.

In these cases, the index number into the hash table

is the same as the key value, and we don’t have to actually ‘hash’ to a table value.

We can use this integer as the index into the table we are building.

Page 7: 1 Chapter 11 Hash Tables Part 1. 2 / 48 Introduction Very VERY fast way to build and access tables (and records in files too). Provide nearly O(1) performance

77//4848

Introduction Hashing Introduction Hashing AlgorithmsAlgorithms

So to ultimately store an object into a hash table (or to a location on disk), we need to arrive at a number, an integer that will be an index into the hash table or the address of the record on disk.

We can start with numbers and restrict the range of acceptable values by using the mod function (remainder method).

We discussed this a couple of slides back.

But sometimes our input into the hashing algorithm may not be a number.

Page 8: 1 Chapter 11 Hash Tables Part 1. 2 / 48 Introduction Very VERY fast way to build and access tables (and records in files too). Provide nearly O(1) performance

88//4848

Hashing: Converting Words to Hashing: Converting Words to NumbersNumbers

Our input into a hashing algorithm may sometimes be characters or a String. So we may want to convert them to numeric form to facilitate randomizing.

One way to convert letters to numbers is to translate: a = 1; b = 2, t = 20, s = 19. Add the total for a

word such as cats, which yields a 43. So ‘cats’ would be stored in location 43.

Unfortunately, it may be that many words might ‘hash’ to 43. These are called ‘synonyms.’

But synonyms cannot all occupy the same single space…So what to do? (Coming soon!)

Page 9: 1 Chapter 11 Hash Tables Part 1. 2 / 48 Introduction Very VERY fast way to build and access tables (and records in files too). Provide nearly O(1) performance

99//4848

Introduction Hashing Introduction Hashing Algorithms Algorithms

Given a key (attribute) in an object (or record), we need a hash function that will map all possible keys in the input space into a specific location within the bounds of the hash table.

What we’d like is to hash to a table location and discover this location is unoccupied and then move the object into the table at the hashed-to location.

It just doesn’t always work that way in practice.

Page 10: 1 Chapter 11 Hash Tables Part 1. 2 / 48 Introduction Very VERY fast way to build and access tables (and records in files too). Provide nearly O(1) performance

1010//4848

Introduction Hashing Algorithms Introduction Hashing Algorithms (Collisions and Synonyms – (Collisions and Synonyms –

TerminologyTerminology)) When one or more keys hash to the same

hash table location, we call this situation a collision.

Those elements (objects, records, …) that do hash to this location are called synonyms.

So, we need a mechanism (algorithm) to deal with synonyms.

Page 11: 1 Chapter 11 Hash Tables Part 1. 2 / 48 Introduction Very VERY fast way to build and access tables (and records in files too). Provide nearly O(1) performance

1111//4848

Introduction Hashing Introduction Hashing AlgorithmsAlgorithms So, practical considerations require us to map key

values (including synonyms) into hash table locations that may NOT be one-to-one.

Also, all input values themselves must map to the range of hash table entries (the table size) either directly or via a hashing algorithm.

Thus, we may have something like: Arrayindex = someNumber % arraySize or

Arrayindex = someKeyConvertedToANumber % arraySize

Page 12: 1 Chapter 11 Hash Tables Part 1. 2 / 48 Introduction Very VERY fast way to build and access tables (and records in files too). Provide nearly O(1) performance

1212//4848

Introduction Hashing Introduction Hashing AlgorithmsAlgorithms

Arrayindex = someNumber % arraySize classic example of a hashing function!

Called Division-Remainder Method.

This algorithm hashes (converts) a ‘number’ in a large range (naturally occurring values – say social security numbers) into a number in a much smaller range (the table size).

(As one might expect, there is a very large number of hashing functions available. (See Donald Knuth))

Page 13: 1 Chapter 11 Hash Tables Part 1. 2 / 48 Introduction Very VERY fast way to build and access tables (and records in files too). Provide nearly O(1) performance

1313//4848

Hashing and Collision Hashing and Collision ExampleExample

So, we have: Arrayindex = someNumber % arraySize

So, any integer mod arraySize results in an Arrayindex range of 0 to arraySize-1.

5280 (number of feet in a mile) mod 51 yields a number between 0 (if 51 divides 5280 evenly) and 50.

Any number mod 51 yields a number between 0 and 50. Period!

Again, this particular hashing algorithm is called the Division Remainder method!

Consider the following exercise:

Page 14: 1 Chapter 11 Hash Tables Part 1. 2 / 48 Introduction Very VERY fast way to build and access tables (and records in files too). Provide nearly O(1) performance

1414//4848

Hashing and Collision Hashing and Collision ExampleExample Clearly, we do not have space for all naturally-

occurring numbers uniquely. Consider an attribute: 1. 42000 = QUOT = 16, REM=0 => index = 0 125 2. 42200 = QUOT = 17, REM=75 => index = 75 125 3. 42200 = QUOT = 17, REM=95 => index = 95 125 4. 42249 = QUOT = 17, REM=124 => index = 124

LOOKS GOOD...

ALL KEY-TO-ADDRESSS TRANSFORMATIONS SEEM TO HASH TO ADDRESSES BETWEEN 1 AND 125.

Actually, we ‘forced’ all input attributes to hash into this range.

Page 15: 1 Chapter 11 Hash Tables Part 1. 2 / 48 Introduction Very VERY fast way to build and access tables (and records in files too). Provide nearly O(1) performance

1515//4848

Hashing and Collision Hashing and Collision ExampleExample

BUT: CONSIDER THE KEYS: 42125, 42250, 42375, ... ALL HASH TO THE SAME ADDRESS!!!

And 40,001, 42,126, 42,151 ALL HASH TO 1 THAT IS, REL KEY = 1 or index = 1.

THUS WE WILL HAVE / may have MANY keys that will hash to the same address (location) in the hash table.

Those input integers (integers in this case) are examples of synonyms!

Page 16: 1 Chapter 11 Hash Tables Part 1. 2 / 48 Introduction Very VERY fast way to build and access tables (and records in files too). Provide nearly O(1) performance

1616//4848

Hashing and Collision Hashing and Collision ExampleExample

This oftentimes arises from a large range of input numbers mapping into much smaller target array sizes.

This is the price we pay to save space. We cannot reserve a ‘spot’ for every possible value

that just ‘might’ hash to a value in a large range. Our table would have to be much too large to be

practical.

There is no guarantee a key will not hash to the same location occupied by another object.

As stated, when two or more keys hash to the same location, we have a collision.

So, what do we do? Only one object can occupy a single location in the hash table.

Page 17: 1 Chapter 11 Hash Tables Part 1. 2 / 48 Introduction Very VERY fast way to build and access tables (and records in files too). Provide nearly O(1) performance

1717//4848

Organization of Chapter 11Organization of Chapter 11 Much of the remainder of this chapter is

devoted to collision algorithms. Then, much later in the chapter, your author

goes back to hashing algorithms.

So, for now, only consider the division remainder hashing algorithm (dividing by a prime number).

(I realize we did not mention prime numbers, but we will get back to this later. Please accept this simple division / remainder hashing algorithm for the time being.)

Page 18: 1 Chapter 11 Hash Tables Part 1. 2 / 48 Introduction Very VERY fast way to build and access tables (and records in files too). Provide nearly O(1) performance

1818//4848

Examples of Collision Examples of Collision AlgorithmsAlgorithms

Open Addressing – three schemes Linear probing Quadratic probing, and Double hashing

Page 19: 1 Chapter 11 Hash Tables Part 1. 2 / 48 Introduction Very VERY fast way to build and access tables (and records in files too). Provide nearly O(1) performance

1919//4848

1. Open Addressing – 1. Open Addressing – Linear Linear ProbeProbe

If we hash to an address that is occupied, we merely add 1 and place the object in the next slot in the hash table.

Hash Algorithm: arrayIndex = key % arraySize.

All operations require numeric values or something converted to numeric values (such as a String).

This scheme will work well if the hash table is not more than half full or at most two-thirds full.

As table becomes more full, Open Addressing does have problems…

Page 20: 1 Chapter 11 Hash Tables Part 1. 2 / 48 Introduction Very VERY fast way to build and access tables (and records in files too). Provide nearly O(1) performance

2020//4848

Open Addressing – Open Addressing – LinearLinear ProbeProbe

Major problems with linear probe: Clustering

If a number of items hash to the same location, the linear probe merely places them in the ‘next’ spot, which, unfortunately is the home address of some other key. But for a table not full, this may yield good performance.

As table becomes more full, the probability of clustering increases dramatically.

Performance can become quite poor as the number of table entries and table size approach each other.

Duplicates can be included – at your peril…

Page 21: 1 Chapter 11 Hash Tables Part 1. 2 / 48 Introduction Very VERY fast way to build and access tables (and records in files too). Provide nearly O(1) performance

2121//4848

Open Addressing – Open Addressing – LinearLinear ProbeProbe

Clustering can result in very long probe lengths especially when the table starts to become full.

Yet, if table is large enough such that perhaps at most 2/3 to ¾ will be filled, the simplicity may be worth the potential degradation in performance.

Page 22: 1 Chapter 11 Hash Tables Part 1. 2 / 48 Introduction Very VERY fast way to build and access tables (and records in files too). Provide nearly O(1) performance

2222//4848

ExampleExample Let’s look at some code and as we do

this; look at the find() method.

Page 23: 1 Chapter 11 Hash Tables Part 1. 2 / 48 Introduction Very VERY fast way to build and access tables (and records in files too). Provide nearly O(1) performance

2323//4848

// hash.java

// Note: this is just the data item (object here) to be stored in the hash table…// demonstrates hash table with linear probingclass DataItem { // (could have more data) private int iData; // data item (key)//-------------------------------------------------------------- public DataItem(int ii) // constructor { iData = ii; }//-------------------------------------------------------------- public int getKey() { return iData; }//-------------------------------------------------------------- } // end class DataItem

Linear Probe Collision AlgorithmWe will look at some of the methods here….

Page 24: 1 Chapter 11 Hash Tables Part 1. 2 / 48 Introduction Very VERY fast way to build and access tables (and records in files too). Provide nearly O(1) performance

2424//4848

class HashTable { private DataItem[ ] hashArray; // array holds hash table pointer private int arraySize; private DataItem nonItem; // for deleted items// ------------------------------------------------------------- public HashTable(int size) // constructor Observe… { arraySize = size; hashArray = new DataItem[arraySize]; // creates the hash table (array) nonItem = new DataItem(-1); //creates an item (object) with -1 value }// end Constructor// ------------------------------------------------------------- public void displayTable() { System.out.print("Table: "); for(int j=0; j<arraySize; j++) { if(hashArray[j] != null) System.out.print(hashArray[j].getKey() + " "); What does this do and how else does it do it? System.out.print("** "); }// end for System.out.println(""); }// end displayTable()// -------------------------------------------------------------

Page 25: 1 Chapter 11 Hash Tables Part 1. 2 / 48 Introduction Very VERY fast way to build and access tables (and records in files too). Provide nearly O(1) performance

2525//4848

public int hashFunc(int key) { return key % arraySize; // hash function }// ------------------------------------------------------------- public void insert(DataItem item) // insert a DataItem // (assumes table not full) { int key = item.getKey(); // extract key from object to be inserted into table int hashVal = hashFunc(key); // hash the key (hopefully, cell is empty). while(hashArray[hashVal] != null && // implies location is occupied hashArray[hashVal].getKey() != -1)//-1 no more entries to insert { // if true (occupied), undertake linear probe!!! ++hashVal; // go to next cell (open addressing…) hashVal %= arraySize; // wraparound if necessary } hashArray[hashVal] = item; // insert item // if open, insert value } // end insert()// ------------------------------------------------------------- -1 also used to indicate logically deleted item… (ahead)

Page 26: 1 Chapter 11 Hash Tables Part 1. 2 / 48 Introduction Very VERY fast way to build and access tables (and records in files too). Provide nearly O(1) performance

2626//4848

Searching Hash Table Searching Hash Table Must use same scheme to search as

to build!!

Page 27: 1 Chapter 11 Hash Tables Part 1. 2 / 48 Introduction Very VERY fast way to build and access tables (and records in files too). Provide nearly O(1) performance

2727//4848

Let’s look at find() Let’s look at find() (next slide)(next slide) We first need to hash the item to be

searched for to get key into hash table Thus, we called hashFunc().

Then, given the index. We look to see if the location in the hash table is empty. If empty, in this scheme, item is not there!

Return a suitable message to client. If non-empty, it checks to see if the value is

the one we are looking for; if so, return it; if not, loop.

Continue until hit or empty slot in hash table.

Next slide has the code for find()

Page 28: 1 Chapter 11 Hash Tables Part 1. 2 / 48 Introduction Very VERY fast way to build and access tables (and records in files too). Provide nearly O(1) performance

2828//4848

public DataItem find(int key) // find item with key { int hashVal = hashFunc(key); // hash the key; get hashVal

while(hashArray[hashVal] != null) // until empty cell, { if(hashArray[hashVal].getKey() == key) // found the key? return hashArray[hashVal]; // yes, return item; leave method

++hashVal; // go to next cell if occupied. hashVal %= arraySize; // wraparound if necessary } return null; // can't find item }// ------------------------------------------------------------- } // end class HashTable

Be aware, table could be full!!! Then what??

Page 29: 1 Chapter 11 Hash Tables Part 1. 2 / 48 Introduction Very VERY fast way to build and access tables (and records in files too). Provide nearly O(1) performance

2929//4848

Let’s look at delete()Let’s look at delete() Delete marks the item as ‘logically deleted’ by

moving a -1 value (other flags / values could be used….) to the key value (often a status byte is used too…)

Later, the hash table can be reorganized where logically deleted entries are physically deleted and the hash table rebuilt…Lots of complexity here…. Not a free lunch and is likely not done immediately.

Discuss: What if there is a chain of synonyms and, say, the second one is deleted? What to do?

Do you delete an item in its ‘home address’ (thus moving ‘null’ into this location if there are synonyms that follow?

How do you know if the next location is a synonym or the home address of a real key?

Page 30: 1 Chapter 11 Hash Tables Part 1. 2 / 48 Introduction Very VERY fast way to build and access tables (and records in files too). Provide nearly O(1) performance

3030//4848

public DataItem delete(int key) // delete a DataItem{ int hashVal = hashFunc(key); // hash the key

while(hashArray[hashVal] != null) // until empty cell, { // found the key? if(hashArray[hashVal].getKey() == key) { DataItem temp = hashArray[hashVal]; // save item for return hashArray[hashVal] = nonItem; // delete item; -1? return temp; // return item } ++hashVal; // go to next cell hashVal %= arraySize; // wraparound if necessary }// end while() return null; // can't find item } // end delete()// -------------------------------------------------------------(// What if hash table is full???)

Page 31: 1 Chapter 11 Hash Tables Part 1. 2 / 48 Introduction Very VERY fast way to build and access tables (and records in files too). Provide nearly O(1) performance

3131//4848

class HashTableApp >>>>NEXT THREE SLIDES >>>>>>>>>>>This is the entire application……………..HashApp uses a user interface… { public static void main(String[] args) throws IOException { DataItem aDataItem; int aKey, size, n, keysPerCell; // get sizes System.out.print("Enter size of hash table: "); size = getInt(); System.out.print("Enter initial number of items: "); GO THROUGH THIS APP n = getInt(); ON YOUR OWN. keysPerCell = 10; // make table HashTable theHashTable = new HashTable(size);

for(int j=0; j<n; j++) // insert data { aKey = (int)(java.lang.Math.random() * keysPerCell * size); aDataItem = new DataItem(aKey); theHashTable.insert(aDataItem); }

Page 32: 1 Chapter 11 Hash Tables Part 1. 2 / 48 Introduction Very VERY fast way to build and access tables (and records in files too). Provide nearly O(1) performance

3232//4848

while(true) { // interact with use System.out.print("Enter first letter of "); System.out.print("show, insert, delete, or find: "); char choice = getChar(); switch(choice) { case 's': theHashTable.displayTable(); break; case 'i': System.out.print("Enter key value to insert: "); aKey = getInt(); aDataItem = new DataItem(aKey); theHashTable.insert(aDataItem); break; case 'd': System.out.print("Enter key value to delete: "); aKey = getInt(); theHashTable.delete(aKey); break; case 'f': System.out.print("Enter key value to find: "); aKey = getInt(); aDataItem = theHashTable.find(aKey); if(aDataItem != null) System.out.println("Found " + aKey); else System.out.println("Could not find " + aKey); break; default: System.out.print("Invalid entry\n"); } // end switch } // end while } // end main()

Page 33: 1 Chapter 11 Hash Tables Part 1. 2 / 48 Introduction Very VERY fast way to build and access tables (and records in files too). Provide nearly O(1) performance

3333//4848

public static String getString() throws IOException { InputStreamReader isr = new InputStreamReader(System.in); BufferedReader br = new BufferedReader(isr); String s = br.readLine(); return s; }//-------------------------------------------------------------- public static char getChar() throws IOException { String s = getString(); return s.charAt(0); }//------------------------------------------------------------- public static int getInt() throws IOException { String s = getString(); return Integer.parseInt(s); }//-------------------------------------------------------------- } // end class HashTableApp

Go through on your own.