Lecture 5b graphs and hashing

GRAPHS - INTRODUCTION

Many real-life problems can be formulated in terms of sets of objects and relationships or connections between objects. Examples include:

Finding routes between cities: the objects could be towns, and the connections could be road/rail links.

Deciding what first year courses to take: the objects are courses, and the relationships are prerequisite and co requisite relations. Similarly, planning a course: the objects are topics, and the relations are prerequisites between topics (you have to understand topic X before topic Y will make sense).

Graphs

DEFINITIONS

A graph is a data structure (ADT) that consists of a set of vertices (or nodes) (which can represent objects), and a set of edges linking vertices (which can represent relationships between the objects).

A tree is a special kind of graph (with certain restrictions).

Graph algorithms operate on a graph data structure, and allow us to, for example, search a graph for a path between two given nodes; find the shortest path between two nodes; or order the vertices in the graph is a particular way.

DEFINITIONS……

A graph is a generalization of the tree structure, where instead of a strict parent/child relationship between tree nodes, any kind of complex relationships between the nodes can be represented.

The graph ADT follows directly from the GRAPH concept from mathematics.

DEFINITIONS.....

Incident edge: (vi,vj) is an edge, then edge(vi,vj) is said to be incident to vertices vi and vj

If vi and vj are connected, they are said to be adjacent vertices/nodes

vi and vj are endpoints of the edge {vi, vj} If an edge e is connected to v, then v is said to

be incident on e. Also, the edge e is said to be incident on v.

DEFINITIONS

Cycle Path that ends back at the starting node Example:

A, B, C, G, AA, B, C, G, A Simple path

No cycles in path Acyclic graph

Graph with no cycles Acyclic undirected graphs are trees

GG

CCBB

AA

HH NN

KK

UnconnecteUnconnected graph d graph with two with two

connected connected componentcomponent

ss

DEFINITIONS Two nodes are reachable if

Path exists between them Connected graph

Every node is reachable from any other node

GG

JJFF

DD

AA

Connected Connected graphgraph

GG

JJFF

DD

AA

EE CC HH

Degree of vertex.....

The number of edges incident onto the vertex For a directed graph:

In degree of a vertex vi is the number of edges incident onto vi, with vi as the head.

Out degree of vertex vi is the number of edges incident onto vi, with vi as the tail.

In a directed graph, the number of edges that point to a given vertex is called its in-degree, and the number that point from it is called its out-degree.

Directed/Digraph Graph Origin and terminating nodes A graph is connected if there is a path between

any two vertices. A directed graph is strongly connected if there

is a directed path between any two vertices (edges have directions).

The degree of a vertex is the number of edges adjacent to it.

Undirected (Undigraph)Graph

A graph is undirected if (x,y) implies (y,x). An edge of the form (x,x) is said to be a loop.

If x is y's friend several times over, that could be modeled using multiedges, multiple edges between the same pair of vertices.

A graph is said to be simple if it contains no loops and multiple edges.

…….

A path is a sequence of edges connecting two vertices.

Since Brooks is my father's-sister's-husband's cousin, there is a path between me and him! Etc.

Graphs Directed graph

Edges have direction

Undirected Undirected graphgraph Undirected Undirected

edgesedges

77

1919

2121

11

1212

44

332222 22

33

GG

JJ

FF

DD

AA

EE CC HH

12

Weighted graph

Weight (cost) is associated with each edge

GG

JJ

FF

DD

AA

EE CC HH

QQ

KK

NN

104

14

6 16

9

8

7

5

22

3

13

Edges are of 2 types Directed edge: A directed edge between the

vertices vi and vj is an ordered pair. It is denoted by <vi,vj>.

Undirected edge: An undirected edge between the vertices vi and vj is an unordered pair. It is denoted by (vi,vj).

Maximum number of edges: The maximum number of edges in an undirected graph with n vertices is n(n−1)/2.

In a directed graph, it is n(n−1).

PATHS

Path (in directed graph) Examples:

A, B, C is a pathA, G, K is not a path

GG

CCBB

AA

HH NN

KK

PATHS

Path in undirected graph Examples:

A, B, C is a pathH, K, C is not a path

GG

CCBB

AA

HH NN

KK

16

Representing Graphs

Adjacency list Each node holds a

list of its neighbors

Adjacency matrix Each cell keeps

whether and how two nodes are connected

Set of edges

00 11 00 11

00 00 11 00

11 00 00 00

00 11 00 00

1

2

3

4

1 2 3 4

{1,2} {1,4} {2,3} {3,1} {1,2} {1,4} {2,3} {3,1} {4,2}{4,2}

1 1 {2, {2, 4}4}2 2 {3} {3}3 3 {1} {1}4 4 {2} {2}

22

4411

33

17

Adjacency Matrix

• 2D array, where n is the number of vertices in the graph• Each row and column is indexed by the vertex id.

- e,g a=0, b=1, c=2, d=3, e=4• An array entry A [i] [j] is equal to 1 if there is an edge connecting vertices i and j. Otherwise, A [i] [j] is 0.

Adjacency Matrix

Adjacency Matrix

2

4

3

5

1

76

9

8

0 0 1 2 3 4 5 6 7 8 9

0 0 0 0 0 0 0 0 0 1 0

1 0 0 1 1 0 0 0 1 0 1

2 0 1 0 0 1 0 0 0 1 0

3 0 1 0 0 1 1 0 0 0 0

4 0 0 1 1 0 0 0 0 0 0

5 0 0 0 1 0 0 1 0 0 0

6 0 0 0 0 0 1 0 1 0 0

7 0 1 0 0 0 0 1 0 0 0

8 1 0 1 0 0 0 0 0 0 1

9 0 1 0 0 0 0 0 0 1 0

Adjacency List

• The adjacency list is an array A[0..n-1] of lists, where n is the number of vertices in the graph.•Each array entry is indexed by the vertex id (as with adjacency matrix)• The list A[i] stores the ids of the vertices adjacent to i.

Adjacency Lists

Adjacency Lists

An adjacency list consists of a array of pointers, where the ith element points to a linked list of the edges incident on vertex i.

It is implemented by representing each node as a data structure that contains a list of all adjacent nodes.

Rows and columns of a two-dimensional array represent source and destination vertices and entries in the graph indicate whether an edge exists between the vertices.

Adjacency List

2

4

3

5

1

76

9

8

0 0

1

2

3

4

5

6

7

8

9

2 3 7 9

8

1 4 8

1 4 5

2 3

3 6

5 7

1 6

0 2 9

1 8

Adjacency Multi list

In the adjacency-list representation, each edge (u, v) is represented by two entries, one on the list for u and the other on the list for v

Multi lists: lists in which nodes may be shared among several lists

For each edge there will be exactly one node, but this node will be in two lists (i.e., the adjacency lists for each of the two nodes to which it is incident)

Adjacency Lists vs. Matrix

Adjacency Lists More compact than adjacency matrices if

graph has few edges Requires more time to find if an edge

exists Adjacency Matrix

Always require n2 spaceThis can waste a lot of space if the

number of edges are sparse Can quickly find if an edge exists

Operations

Typical operations associated with graphs are: finding a path between two nodes, e.g. the shortest path from one node to another.

A directed graph can be seen as a flow network, where each edge has a capacity and each edge receives a flow.

Comparison with other data structures

Graph data structures are non-hierarchical and therefore suitable for data sets where the individual elements are interconnected in complex ways.

For example, a computer network can be simulated with a graph.

Hierarchical data sets can be represented by a binary or non binary tree.

It is worth mentioning, however, that trees can be seen as a special form of graph.

Graph traversal

Traversal of graph implies visiting the nodes of the graph.

A graph can be traversed in 2 ways Depth first traversal Breadth first traversal

Depth First traversal When a graph is traversed by

visiting the nodes in the forward (deeper) direction as long as possible, the traversal is called depth-first traversal.

E.g. the depth-first traversal starting at the vertex 0 visits the node in the orders: 0 1 2 6 7 8 5 3 4 0 4 3 5 8 6 7 2 1

Breadth first traversal

When a graph is traversed by visiting all the adjacent nodes/vertices of a node/vertex first, the traversal is called breadth-first traversal.

For a graph in which the breadth-first traversal starts at vertex v1, visits to the nodes take place in the order shown in Figure

Minimum Cost spanning tree When the edges of the graph

have weights representing the cost in some suitable terms, we can obtain that spanning tree of a graph whose cost is minimum in terms of the weights of the edges.

For this, we start with the edge with the minimum-cost/weight, add it to set T, and mark it as visited.

We next consider the edge with minimum-cost that is not yet visited, add it to T, and mark it as visited. While adding an edge to the set T, we first check whether both the vertices of the edge are visited; if they are, we do not add to the set T, because it will form a cycle.

The minimum-cost spanning tree of the graph is as shown

BFS and Shortest Path Problem

Given any source vertex s, BFS visits the other vertices at increasing distances away from s. In doing so, BFS discovers paths from s to other vertices

What do we mean by “distance”? The number of edges on a path from s.

2

4

3

5

1

76

9

8

0

Consider s=vertex 1

Nodes at distance 1? 2, 3, 7, 91

1

1

12

22

2

s

Example

Nodes at distance 2? 8, 6, 5, 4

Nodes at distance 3? 0

Graphs and Their Applications Graphs have many real-world applications

Modeling a computer network like InternetRoutes are simple paths in the network

Modeling a city mapStreets are edges, crossings are vertices

Social networksPeople are nodes and their connections are

edges State machines

States are nodes, transitions are edges

Representing Graphs in C#public class Graphpublic class Graph{{ int[][] childNodes;int[][] childNodes; public Graph(int[][] public Graph(int[][] nodes)nodes) {{ this.childNodes = nodes;this.childNodes = nodes; }}}}

Graph g = new Graph(new int[][] {Graph g = new Graph(new int[][] { new int[] {3, 6}, // successors of vertice 0new int[] {3, 6}, // successors of vertice 0 new int[] {2, 3, 4, 5, 6}, // successors of new int[] {2, 3, 4, 5, 6}, // successors of vertice 1vertice 1 new int[] {1, 4, 5}, // successors of vertice 2new int[] {1, 4, 5}, // successors of vertice 2 new int[] {0, 1, 5}, // successors of vertice 3new int[] {0, 1, 5}, // successors of vertice 3 new int[] {1, 2, 6}, // successors of vertice 4new int[] {1, 2, 6}, // successors of vertice 4 new int[] {1, 2, 3}, // successors of vertice 5new int[] {1, 2, 3}, // successors of vertice 5 new int[] {0, 1, 4} // successors of vertice 6new int[] {0, 1, 4} // successors of vertice 6});});

0066

4411

55

22

33

HASH TABLES - INTRODUCTION

WHY the use of Hash tables Hash tables are good for doing a quick search

on things. For instance if we have an array full of data

(say 100 items). If we knew the position that a specific item is stored in an array, then we could quickly access it.

For instance, we just happen to know that the item we want is at position 3; I can apply: myitem=myarray[3];

HASH TABLES - INTRODUCTION

With this, we don't have to search through each element in the array, we just access position 3.

The question is, how do we know that position 3 stores the data that we are interested in?

This is where hashing comes in handy. Given some key, we can apply a hash function

to it to find an index or position that we want to access.

HASH FUNCTION

Hashed Table

Defines the table as one that is managed with an internal hash procedure.

A hashed table is a set, whose elements you can address using their unique key.

Unlike standard and sorted tables, you cannot access hash tables using an index.

All entries in the table must have a unique key.

A small phone book as a hash table

Choosing a good hash function

A good hash function is essential for good hash table performance.

A poor choice of hash function is likely to lead to clustering, in which probability of keys mapping to the same hash bucket (i.e. a collision) is significantly greater than would be expected from a random function.

Collision resolution

If two keys hash to the same index, the corresponding records cannot be stored in the same location.

So, if it's already occupied, we must find another location to store the new record, and do it so that we can find it when we look it up later on.

There are a number of collision resolution techniques, chaining and open addressing.

…….

Difference has to do with whether collisions are stored outside the table (open hashing) or whether collisions result in storing one of the records at another slot in the table (closed hashing)

Chaining

Hash collision resolved by chaining

In the simplest chained hash table technique, each slot in the array references a linked list of inserted records that collide to the same slot.

Insertion requires finding the correct slot, and appending to either end of the list in that slot; deletion requires searching the list and removal.

Chained hash tables inherit the disadvantages of linked lists.

When storing small records, the overhead of the linked list can be significant. Also, traversing a linked list has poor cache performance.

Open Addressing

Open addressing hash tables can store the records directly within the array.

A hash collision is resolved by probing, or searching through alternate locations in the array (the probe sequence) until either the target record is found, or an unused array slot is found, which indicates that there is no such key in the table.

Probe sequences include:

Linear probing the interval between probes is fixed--often at 1,

Quadratic probing the interval between probes increases linearly (hence, the indices are described by a quadratic function), and

Double probing the interval between probes is fixed for each record but is computed by another hash function.

……….

Open Addressing Vs. Chaining They are simple to implement effectively and

only require basic data structures. From the point of view of writing suitable hash

functions, chained hash tables are insensitive to clustering, only requiring minimization of collisions.

OA depends upon better hash functions to avoid clustering. This is particularly important if novice programmers can add their own hash functions.

Open Addressing Vs. Chaining

They degrade in performance more gracefully. Although chains grow longer as the table fills, a chained hash table cannot "fill up" and does not exhibit the sudden increases in lookup times that occur in a near-full table with open addressing.

If the hash table stores large records, about 5 or more words per record, chaining uses less memory than open addressing.

Open Addressing Vs. Chaining

If the hash table is sparse (that is, it has a big array with many free array slots), chaining uses less memory than open addressing even for small records of 2 to 4 words per record due to its external storage.

If the hash table is sparse (that is, it has a big array with many free array slots), chaining uses less memory than open addressing even for small records of 2 to 4 words per record due to its external storage.

Applications of Hash Tables

Hash tables are good in situations where you have enormous amounts of data from which you would like to quickly search and retrieve information.

A few typical hash table implementations would be in the following situations:

Applications of Hash Tables

Driver's license record's. With a hash table, you could quickly get information about the driver (i.e. name, address, age) given the license number.

Compiler symbol tables. The compiler uses a symbol table to keep track of the user-defined symbols in a program. This allows the compiler to quickly look up attributes associated with symbols (for example, variable names)

Applications of Hash Tables…..

For internet search engines. For telephone book databases. You could

make use of a hash table implementation to quickly look up Joan’s telephone number.

For electronic library catalogs. Hash Table implementations allow for a fast find among the millions of materials stored in the library.

Applications of Hash Tables…..

For implementing passwords for systems with multiple users.

Hash Tables allow for a fast retrieval of the password which corresponds to a given username.

QUESTIONS

END

Data & Analytics

Lecture 5b graphs and hashing