18
Universidade Federal da Paraíba Centro de Informática Hash Tables I Lecture 10 1107186 – Estrutura de Dados – Turma 02 Prof. Christian Azambuja Pagot CI / UFPB

Data Structures - 07. Hash Tables, I

Embed Size (px)

DESCRIPTION

Slide da cadeira de Estrutura de Dados, ministrado pelo Prof. Dr. Christian Pagot, na Universidade Federal da Paraíba.

Citation preview

Page 1: Data Structures - 07. Hash Tables, I

Universidade Federal da ParaíbaCentro de Informática

Hash Tables ILecture 10

1107186 – Estrutura de Dados – Turma 02

Prof. Christian Azambuja PagotCI / UFPB

Page 2: Data Structures - 07. Hash Tables, I

2Universidade Federal da ParaíbaCentro de Informática

Let's Start with a Problem

● Suppose that you wish to implement a dictionary to store (key, data item) pairs where:– key = an uppercase character.

– data item = anything.

● In this case, there will be up to 26 distinct dictionary entries!

Which data structurecould be used to

implement it?

Page 3: Data Structures - 07. Hash Tables, I

3Universidade Federal da ParaíbaCentro de Informática

An Array-based Approach

● Each key (char) can be used to compute the corresponding index of the array:– Array index = h(key)

● In this case, h is known as hash function!

unsigned int h(char c){

return c – 'A';}

C code excerpt:

Page 4: Data Structures - 07. Hash Tables, I

4Universidade Federal da ParaíbaCentro de Informática

A more Interesting Problem

● Now, suppose that the keys are the plates of cars and that they present the following format:

– where C is a uppercase character and D is a decimal digit.

● 26*26*26*10*10*10*10 = 175.760.000 distinct possibilities!

CCCDDDD

It can be the case that this is too much data for the previous array-based

implementation!

Page 5: Data Structures - 07. Hash Tables, I

5Universidade Federal da ParaíbaCentro de Informática

What is a Hash Table?

● A hash table is a data structure used to implement associative arrays (also called maps, dictionaries, etc).

● Its use is effective in the case where the number of actual keys is small when compared to the total number of possible keys.

Page 6: Data Structures - 07. Hash Tables, I

6Universidade Federal da ParaíbaCentro de Informática

What is a Hash Table?

Universe of keys

k1

k 2

k 3

k 4

Universe ofactual keys

Index Data

...

...

...

Hash table

h(k1)

h(k3)

h(k2)=h(k

4)

Page 7: Data Structures - 07. Hash Tables, I

7Universidade Federal da ParaíbaCentro de Informática

Applications

● It can be used when we want to keep in-memory dictionaries:

● Compilers: – Symbol tables (user-defined symbols).

● Fast access to records in databases.● Web search.● Etc.

Page 8: Data Structures - 07. Hash Tables, I

8Universidade Federal da ParaíbaCentro de Informática

Implementing Our Car Database with a Hash Table

● Defining the hash function:– The key (car plate) has the following format:

– One possibility is to transform each D to its corresponding digit (with the place value) and each C to a value in the range [0,25] (with the place value).

CCCDDDD

Page 9: Data Structures - 07. Hash Tables, I

9Universidade Federal da ParaíbaCentro de Informática

Implementing Our Car Database with a Hash Table

● Implementation of the hash function:unsigned int Hash(char* k){

int i;unsigned int hash = 0;int place = 1;

for (i=6; i>=3; i­­) {hash = hash + Char2Int(k[i]) * place;place *= 10;

}

for (i=2; i>=0; i­­) {hash = hash + (k[i] ­ 'A') * place;place *= 26;

}

return hash;}

C code excerpt:

The hash may extrapolate

the actual sizeof the array!

Page 10: Data Structures - 07. Hash Tables, I

10Universidade Federal da ParaíbaCentro de Informática

Implementing Our Car Database with a Hash Table

● Reducing the hash to a valid index:– The hash can be constrained to a valid index

with the help of the modulus operator:

hash = Hash(plate);compressed_hash = hash % HASH_TABLE_MAX_ENTRIES;

C code excerpt:

Size of the array

Page 11: Data Structures - 07. Hash Tables, I

11Universidade Federal da ParaíbaCentro de Informática

How to solve collisions?

Implementing Our Car Database with a Hash Table

● Assuming an array with 5 entries, and the following plates:

Plate Hash Reduced hashNWL5356 93715356 1

QBH2709 108492709 4

ZOW6261 172866261 1

IDD2023 54892023 3

XRJ2289 159992289 4

Page 12: Data Structures - 07. Hash Tables, I

12Universidade Federal da ParaíbaCentro de Informática

Solving Collisions

● There are several ways to solve collisions. They include:– Chaining.

– Open Addressing (linear probing).

– Etc.

Page 13: Data Structures - 07. Hash Tables, I

13Universidade Federal da ParaíbaCentro de Informática

Chaining

● The hash table consists of an array whose elements are pointers to lists:

Index Data

0 NULL

1

2 NULL

3

... NULL

... NULL

n-1

Hash table

Each nodecontains the

corresponding key

Each list contains elementswhose the key hash to the

same value.

Page 14: Data Structures - 07. Hash Tables, I

14Universidade Federal da ParaíbaCentro de Informática

Implementing Our Car Database with a Hash Table

● The structure of the array and the list nodes:

struct Node{

char plate[8];struct Node* next;

};

typedef struct Node* HashTable[HASH_TABLE_MAX_ENTRIES];

...

C code excerpt:

Page 15: Data Structures - 07. Hash Tables, I

15Universidade Federal da ParaíbaCentro de Informática

Implementing Our Car Database with a Hash Table

● Creating the hash table:

void InitHashTable(HashTable h){

unsigned int i;

for (i=0; i<HASH_TABLE_MAX_ENTRIES; i++)h[i]= NULL;

}

...

HashTable h;InitHashTable(h);...

C code excerpt:

Page 16: Data Structures - 07. Hash Tables, I

16Universidade Federal da ParaíbaCentro de Informática

Implementing Our Car Database with a Hash Table

● Inserting data into the hash table:

void InsertPlateIntoHashTable(HashTable h, char* p){

... 

hash = Hash(p);compressed_hash = hash % HASH_TABLE_MAX_ENTRIES;

if (h[compressed_hash] == NULL){

h[compressed_hash] = (struct Node*) malloc (sizeof(struct Node));strcpy(h[compressed_hash]­>plate, p);h[compressed_hash]­>next = NULL;

}else... 

C code excerpt:

Page 17: Data Structures - 07. Hash Tables, I

17Universidade Federal da ParaíbaCentro de Informática

Implementing Our Car Database with a Hash Table

● Inserting data into the hash table:

... else{

struct Node* n = h[compressed_hash];

while (n­>next != NULL)n = n­>next;

n­>next = (struct Node*) malloc (sizeof(struct Node));strcpy(n­>next­>plate, p);n­>next­>next = NULL;

}

return collision;}

C code excerpt:

Page 18: Data Structures - 07. Hash Tables, I

18Universidade Federal da ParaíbaCentro de Informática

Open Addressing (Linear Probing)

● When collision is detected, instead of inserting the node in a list, the algorithm searches for a free slot in the array to insert the new element.