84
Trees 1 Lecture 5 Tree Data Structure Tree Data Structure

Lecture 5 trees

Embed Size (px)

Citation preview

Page 1: Lecture 5 trees

Trees1

Lecture 5

Tree Data StructureTree Data Structure

Page 2: Lecture 5 trees

Basic Tree Concepts

Trees2

A tree consists of finite set of elements, called nodes, and a finite set of directed lines called branches, that connect the nodes.

The number of branches associated with a node is the degree of the node.

Page 3: Lecture 5 trees

Trees3

Page 4: Lecture 5 trees

Trees4

Page 5: Lecture 5 trees

TreeA simple unordered tree; in this diagram, the node

labelled 7 has two children, labelled 2 and 6, and one parent, labelled 2. The root node, at the top, has no parent.

A tree is a widely-used data structure that emulates a hierarchical tree structure with a set of linked nodes.

Page 6: Lecture 5 trees

Basic Tree Concepts

Trees6

When the branch is directed toward the node, it is in degree branch.

When the branch is directed away from the node, it is an out degree branch.

The sum of the in degree and out degree branches is the degree of the node.

If the tree is not empty, the first node is called the root.

Page 7: Lecture 5 trees

Basic Tree Concepts

Trees7

The in degree of the root is, by definition, zero.

With the exception of the root, all of the nodes in a tree must have an in degree of exactly one; that is, they may have only one predecessor.

All nodes in the tree can have zero, one, or more branches leaving them; that is, they may have out degree of zero, one, or more.

Page 8: Lecture 5 trees

Trees8

Page 9: Lecture 5 trees

Basic Tree Concepts

Trees9

A leaf is any node with an out degree of zero, that is, a node with no successors.

A node that is not a root or a leaf is known as an internal node.

A node is a parent if it has successor nodes; that is, if it has out degree greater than zero.

A node with a predecessor is called a child.

Page 10: Lecture 5 trees

Basic Tree Concepts

Trees10

Two or more nodes with the same parents are called siblings.

An ancestor is any node in the path from the root to the node.

A descendant is any node in the path below the parent node; that is, all nodes in the paths from a given node to a leaf are descendants of that node.

Page 11: Lecture 5 trees

Basic Tree Concepts

Trees11

A path is a sequence of nodes in which each node is adjacent to the next node.

The level of a node is its distance from the root. The root is at level 0, its children are at level 1, etc. …

Page 12: Lecture 5 trees

Basic Tree Concepts

Trees12

The height of the tree is the level of the leaf in the longest path from the root plus 1. By definition the height of any empty tree is -1.

A sub tree is any connected structure below the root.

The first node in the sub tree is known as the root of the sub tree.

Page 13: Lecture 5 trees

Trees13

Page 14: Lecture 5 trees

Trees14

Page 15: Lecture 5 trees

Recursive definition of a tree

Trees15

A tree is a set of nodes that either:is empty orhas a designated node, called the root,

from which hierarchically descend zero or more sub trees, which are also trees.

Page 16: Lecture 5 trees

Tree Representation

Trees16

General Tree – organization chart format

Indented list – bill-of-materials system in which a parts list represents the assembly structure of an item

Page 17: Lecture 5 trees

Trees17

Page 18: Lecture 5 trees

Trees18

Page 19: Lecture 5 trees

Parenthetical Listing

Trees19

Parenthetical Listing – the algebraic expression, where each open parenthesis indicates the start of a new level and each closing parenthesis completes the current level and moves up one level in the tree.

Page 20: Lecture 5 trees

Parenthetical Listing

Trees20

A (B (C D) E F (G H I) )

Page 21: Lecture 5 trees

Trees21

Binary Trees

A binary tree can have no more than two A binary tree can have no more than two descendentsdescendents

• Properties• Binary Tree Traversals• Expression Trees• Huffman Code

Page 22: Lecture 5 trees

Binary Trees

Trees22

A binary tree is a tree in which no node can have more than two sub trees; the maximum out degree for a node is two.

In other words, a node can have zero, one, or two sub trees.

These sub trees are designated as the left sub tree and the right sub tree.

Page 23: Lecture 5 trees

Trees23

Page 24: Lecture 5 trees

Trees24

A null tree is a tree with no nodes

Page 25: Lecture 5 trees

Some Properties of Binary TreesThe height of binary trees can be mathematically

predictedGiven that we need to store N nodes in a binary

tree, the maximum height is

maxH N

Trees25

A tree with a maximum height is rare. It occurs when all of the nodes in the entire tree have only one successor.

Page 26: Lecture 5 trees

Some Properties of Binary Trees

The minimum height of a binary tree is determined as follows:

min 2log 1H N

Trees26

For instance, if there are three nodes to be stored in the binary tree (N=3) then Hmin=2.

Page 27: Lecture 5 trees

Some Properties of Binary Trees

Given a height of the binary tree, H, the minimum number of nodes in the tree is given as follows:

minN H

Trees27

Page 28: Lecture 5 trees

Some Properties of Binary Trees

The formula for the maximum number of nodes is derived from the fact that each node can have only two descendents.

Given a height of the binary tree, H, the maximum number of nodes in the tree is given as follows:

max 2 1HN

Trees28

Page 29: Lecture 5 trees

Some Properties of Binary TreesThe children of any node in a tree can be accessed by

following only one branch path, the one that leads to the desired node.

The nodes at level 1, which are children of the root, can be accessed by following only one branch; the nodes of level 2 of a tree can be accessed by following only two branches from the root, etc.

The balance factor of a binary tree is the difference in height between its left and right sub trees:

L RB H H Trees29

Page 30: Lecture 5 trees

Trees30

B=0 B=0 B=1 B=-1

B=0 B=1

B=-2 B=2

Balance of the tree

Page 31: Lecture 5 trees

Some Properties of Binary Trees

In the balanced binary tree (the height of its sub trees differs by no more than one) its balance factor is -1, 0, or 1), and its sub trees are also balanced.

Trees31

Page 32: Lecture 5 trees

Complete and nearly complete binary trees

Trees32

A complete tree has the maximum number of entries for its height. The maximum number is reached when the last level is full.

A tree is considered nearly complete if it has the minimum height for its nodes and all nodes in the last level are found on the left

Page 33: Lecture 5 trees

Trees33

Page 34: Lecture 5 trees

Implementing Binary Trees

Just like other ADTs, we can implement a binary tree using

pointers or arrays. A pointer based implementation

Page 35: Lecture 5 trees

Implementing Binary Trees Root

RootData Value

Left Right

Data Value

Left Right

Data Value

Left Right

etc.

Page 36: Lecture 5 trees

Traversal through BSTsRemember that a binary tree is either empty or

it is in the form of a Root with two sub trees. If the Root is empty, then the traversal

algorithm should take no action (i.e., this is an empty tree -- a "degenerate" case).

If the Root is not empty, then we need to print the information in the root node and start traversing the left and right sub trees.

When a sub tree is empty, then we stop traversing it.

Page 37: Lecture 5 trees

Traversal through BSTsThe recursive traversal algorithm is:

Traverse (Root)

If the Tree is not empty then

Visit the node at the Root

Traverse(Left sub tree)

Traverse(Right sub tree)When traversing any binary tree, the algorithm should have

3 choices of when to process the root: before it traverses both sub trees,after it traverses the left sub tree, or after it traverses both sub trees. Each of these traversal methods has a name: preorder, in

order, post order

Page 38: Lecture 5 trees

Binary Tree Traversal

Trees38

A binary tree traversal requires that each node of the tree be processed once and only once in a predetermined sequence.

In the depth-first traversal processing process along a path from the root through one child to the most distant descendant of that first child before processing a second child.

Page 39: Lecture 5 trees

Trees39

Page 40: Lecture 5 trees

Trees40

Page 41: Lecture 5 trees

Trees41

Page 42: Lecture 5 trees

Trees42

Page 43: Lecture 5 trees

Trees43

Page 44: Lecture 5 trees

….Pre order TraversalIt would traverse the following tree as:

60,20,10,5,15,40,30,70,65,85

60

2070

10

5 15 30

40 65 85

Page 45: Lecture 5 trees

Trees45

Page 46: Lecture 5 trees

Trees46

Page 47: Lecture 5 trees

…In order traversal It would traverse the same tree as:

5,10,15,20,30,40,60,65,70,85; Notice that this type of traversal produces the numbers in

order. Search trees can be set up so that all of the nodes in the

left sub tree are less than the nodes in the right sub tree.60

2070

10

5 15 30

40 65 85

Page 48: Lecture 5 trees

Trees48

Page 49: Lecture 5 trees

Trees49

Page 50: Lecture 5 trees

…Post order traversal It would traverse the same tree as:

5, 15, 10,30,40,20,65,85,70,60

Page 51: Lecture 5 trees

Tree RepresentationsThere are many different ways to represent trees.Common representations represent the nodes as records

allocated on the heap with pointers to their children, their parents, or both, or as items in an array, with relationships between them determined by their positions in the array (e.g., binary heap).

Page 52: Lecture 5 trees

Trees and GraphsThe tree data structure can be generalized to represent

directed graphs by removing the constraints that a node may have at most one parent, and that no cycles are allowed.

Edges are still abstractly considered as pairs of nodes, however, the terms parent and child are usually replaced by different terminology (for example, source and target).

Page 53: Lecture 5 trees

Relationship with Trees in Graph Theory

• In graph theory, a tree is a connected acyclic graph; unless stated otherwise, trees and graphs are undirected.

• There is no one-to-one correspondence between such trees and trees as data structure.

• We can take an arbitrary undirected tree, arbitrarily pick one of its vertices as the root, make all its edges directed by making them point away from the root node - producing an arborescence and assign an order to all the nodes.

The result corresponds to a tree data structure. Picking a different root or different ordering produces a

different one.

Page 54: Lecture 5 trees

• Enumerating all the items• Enumerating a section of a tree• Searching for an item• Adding a new item at a certain position on the tree• Deleting an item• Removing a whole section of a tree is called pruning• Adding a whole section to a tree is called grafting• Finding the root for any node

Common Operations

Page 55: Lecture 5 trees

Common Uses

Manipulate hierarchical dataMake information easy to searchManipulate sorted lists of dataAs a workflow for composting digital images for

visual effectsRouter algorithms

Page 56: Lecture 5 trees

Search TreesA binary search tree can be created so that the

elements in it satisfy an ordering property. This allows elements to be searched for quickly. All of the elements in the left sub-tree are less than the

element at the root which is less than all of the elements in the right sub-tree and this property applies recursively to all the sub-trees.

The great advantage of this is that when searching for an element, a comparison with the root will either find the element or indicate which one sub-tree to search.

The ordering is an invariant property of the search tree. All routines that operate on the tree can make use of it provided that they also keep it holding true.

Page 57: Lecture 5 trees

Additional…Binary Search Trees

Key propertyValue at node

Smaller values in left sub-treeLarger values in right sub-tree

ExampleX > YX < Z

Y

X

Z

Page 58: Lecture 5 trees

Binary Search TreesExamples

Binary search trees

Not a binary search tree

5

10

30

2 25 45

5

10

45

2 25 30

5

10

30

2

25

45

Page 59: Lecture 5 trees

Example Binary SearchesFind ( root, 2 )

5

10

30

2 25 45

5

10

30

2

25

45

10 > 2, left

5 > 2, left

2 = 2, found

5 > 2, left

2 = 2, found

root

Page 60: Lecture 5 trees

Example Binary SearchesFind (root, 25 )

5

10

30

2 25 45

5

10

30

2

25

45

10 < 25, right

30 > 25, left

25 = 25, found

5 < 25, right

45 > 25, left

30 > 25, left

10 < 25, right

25 = 25, found

Page 61: Lecture 5 trees

Types of Binary TreesDegenerate – only one childComplete – always two childrenBalanced – “mostly” two children

more formal definitions exist, above are intuitive ideas

Degenerate binary tree

Balanced binary tree

Complete binary tree

Page 62: Lecture 5 trees

Binary Trees PropertiesDegenerate

Height = O(n) for n nodes

Similar to linked list

BalancedHeight = O( log(n) ) for

n nodesUseful for searches

Degenerate binary tree

Balanced binary tree

Page 63: Lecture 5 trees

Binary Search PropertiesTime of search

Proportional to height of treeBalanced binary tree

O( log(n) ) timeDegenerate tree

O( n ) timeLike searching linked list / unsorted

array

Page 64: Lecture 5 trees

Binary Search Tree ConstructionHow to build & maintain binary trees?

InsertionDeletion

Maintain key property (invariant)Smaller values in left sub treeLarger values in right sub tree

Page 65: Lecture 5 trees

Example InsertionInsert ( 20 )

5

10

30

2 25 45

10 < 20, right

30 > 20, left

25 > 20, left

Insert 20 on left

20

Page 66: Lecture 5 trees

Example Deletion (Leaf)Delete ( 25 )

5

10

30

2 25 45

10 < 25, right

30 > 25, left

25 = 25, delete

5

10

30

2 45

Page 67: Lecture 5 trees

Example Deletion (Internal Node)Delete ( 10 )

5

10

30

2 25 45

5

5

30

2 25 45

2

5

30

2 25 45

Replacing 10 with largest value in left

subtree

Replacing 5 with largest value in left

subtree

Deleting leaf

Page 68: Lecture 5 trees

Example Deletion (Internal Node)Delete ( 10 )

5

10

30

2 25 45

5

25

30

2 25 45

5

25

30

2 45

Replacing 10 with smallest value in right

subtree

Deleting leaf Resulting tree

Page 69: Lecture 5 trees

Balanced Search TreesKinds of balanced binary search trees

height balanced vs. weight balanced“Tree rotations” used to maintain balance on

insert/deleteNon-binary search trees

2/3 treeseach internal node has 2 or 3 childrenall leaves at same depth (height balanced)

Page 70: Lecture 5 trees

Balanced Search TreesB-trees

Generalization of 2/3 treesEach node has an array of

pointers to childrenWidely used in databases

Page 71: Lecture 5 trees

AVL TreeAVL (Adelson-Velskii and Landis) tree.Also called Height Balanced Binary Search TreesAn AVL tree is identical to a BST except

Height of the left and right sub trees can differ by at most 1.

Height of an empty tree is defined to be (–1). Every sub tree is an AVL tree.

Page 72: Lecture 5 trees

AVL Tree

An AVL Tree

5

82

4

3

1 7

height0

1

2

3

Page 73: Lecture 5 trees

AVL Tree

Not an AVL tree

6

81

4

3

1

5

height0

1

2

3

Page 74: Lecture 5 trees

Example

Page 75: Lecture 5 trees

An example of an AVL tree where the heights are shown next to the nodes

88

44

17 78

32 50

48 62

2

4

1

1

2

3

1

1

Page 76: Lecture 5 trees

Balanced Binary Tree

The height of a binary tree is the maximum level of its leaves (also called the depth).

The balance of a node in a binary tree is defined as the height of its left sub tree minus height of its right sub tree.

Each node has an indicated balance of 1, 0, or –1.

Page 77: Lecture 5 trees

B-Tree

A B-Tree of order m is an m-way tree, such that:All leaves are on the same levelAll internal nodes except the root are constrained to have

at most non empty children and at least m/2 non empty children

The root has at most m non empty children

Page 78: Lecture 5 trees

B-Tree of order 2, also known as 2-3-4-tree:

1717 2121

771111

1188

2200

2266

3311

22 44 55 66 88 991122

1166

2222

2233

2255

2277

2299

3300

3322

3355

Page 79: Lecture 5 trees

B- TreeA B-tree is a tree data structure that keeps data sorted

and allows searches, sequential access, insertions, and deletions in logarithmic time.

The B-tree is a generalization of a binary search tree in that a node can have more than two children. B-tree is optimized for systems that read and write large blocks of data.

It is commonly used in databases and file systems.

Page 80: Lecture 5 trees

...B- TreeA B-tree is a specialized multi-way tree designed

especially for use on disk. In a B-tree each node may contain a large number of keys.

The number of sub trees of each node, then, may also be large.

A B-tree is designed to branch out in this large number of directions and to contain a lot of keys in each node so that the height of the tree is relatively small.

This means that only a small number of nodes must be read from disk to retrieve an item.

The goal is to get fast access to the data, and with disk drives this means reading a very small number of records.

Page 81: Lecture 5 trees

B - TreeFor example, the following is a multiway search tree of

order 4. Note that the first row in each node shows the keys, while the second row shows the pointers to the child nodes.

There is a record of data associated with each key, so that the first row in each node might be an array of records where each record contains a key and its associated data.

Another approach would be to have the first row of each node contain an array of records where each record contains a key and a record number for the associated data record, which is found in another file.

This is often used when the data records are large.

Page 82: Lecture 5 trees

.....

Page 83: Lecture 5 trees

....

Records are stored in locations called leaves. This name derives from the fact that records always exist

at end points; there is nothing beyond them. The maximum number of children per node is the order of

the tree. The number of required disk accesses is the depth. The image at left shows a binary tree for locating a

particular record in a set of eight leaves.

Page 84: Lecture 5 trees

The image at right shows a B-tree of order three for locating a particular record in a set of eight leaves (the ninth leaf is unoccupied, and is called a null).

The binary tree at left has a depth of four; the B-tree at right has a depth of three.

Clearly, the B-tree allows a desired record to be located faster, assuming all other system parameters are identical.

A sophisticated program is required to execute the operations in a B-tree. But this program is stored in RAM, so it runs fast.