27
CS305/503, Spring 2009 Self-Balancing Trees Michael Barnathan

CS305/503, Spring 2009 Self-Balancing Trees Michael Barnathan

  • View
    219

  • Download
    2

Embed Size (px)

Citation preview

Page 1: CS305/503, Spring 2009 Self-Balancing Trees Michael Barnathan

CS305/503, Spring 2009Self-Balancing Trees

Michael Barnathan

Page 2: CS305/503, Spring 2009 Self-Balancing Trees Michael Barnathan

Here’s what we’ll be learning:

• Data Structures:– AVL Trees.– B+ Trees.

Page 3: CS305/503, Spring 2009 Self-Balancing Trees Michael Barnathan

AVL Trees – The Idea• We looked at an algorithm for balancing trees using

rotations last time.• This turns out to be a pretty good strategy in general.

– Rotations are O(1): they only affect up to 2 levels of the tree no matter how deep it is.

– As in the DSW algorithm, rotations can be used to maintain tree balance.

– The trick is knowing when to apply them.• A left rotation will decrease the right subtree’s height and increase

the left subtree’s height.• A right rotation will do the opposite.

– Recall: A balanced tree is one in which the depth of the leaves differ by no more than one level.

• We can enforce this condition using rotations!

Page 4: CS305/503, Spring 2009 Self-Balancing Trees Michael Barnathan

Balance Factor

• The difference between the height of a node’s right and left subtrees is called the node’s balance factor.– Balance Factor = Height(Right) - Height(Left).– Some sources define it as Height(Left) - Height(Right), but

this does not change anything.• Leaves, having no children, have a balance factor of 0

(Height(right) = Height(left) = 0).• By the definition of tree balance, a subtree is

considered balanced if its balance factor is -1, 0, or 1.• A left rotation will lower the balance factor.• A right rotation will raise it.

Page 5: CS305/503, Spring 2009 Self-Balancing Trees Michael Barnathan

Balance Factor

5

4

1

2

3

-1

0 0

0 0

No balance factor < -1 or > 1:This tree is balanced.

-2

0

0 0

A node has a balance factor of -2:This tree is not balanced.

A right rotation will balance this tree.

4

1

2

3

+1

-1

Page 6: CS305/503, Spring 2009 Self-Balancing Trees Michael Barnathan

AVL Trees - Structure• Small modification to a node’s structure:class BinaryTree {

int value;BinaryTree left;BinaryTree right;

}

class AVLTree extends BinaryTree {//value, left, and right are inherited.int balanceFactor;

}

Page 7: CS305/503, Spring 2009 Self-Balancing Trees Michael Barnathan

AVL Trees: Algorithms• Insertion and deletion must keep the balance. Access doesn’t change.• Insertion:

– Insert as in a normal binary search tree, but go back up the tree and update the balance factor of each node back towards the root.

– “Go back up the tree” -> do something after the recursive call / on the “pop”.– If the balance factor becomes +2 or -2, rotate to correct it.– Four different cases involving up to 2 rotations.

• Deletion:– Delete as in a normal binary search tree (replacing the node with its inorder

successor), but go back up the tree and adjust the balance factors.– If the balance factor becomes +2 or -2, rotate to correct it.– If the balance factor becomes +1 or -1, we can stop.

• This indicates that the height of the subtree hasn’t changed.– If the balance factor becomes 0, we must keep going.– The deletion algorithm is very similar to the BST algorithm, so I won’t present

it formally.

Page 8: CS305/503, Spring 2009 Self-Balancing Trees Michael Barnathan

Insertion Cases (Wikipedia)

Note the similarity to tree_to_list:

Page 9: CS305/503, Spring 2009 Self-Balancing Trees Michael Barnathan

Visual AVL Demonstration

• http://webpages.ull.es/users/jriera/Docencia/AVL/AVL%20tree%20applet.htm

Page 10: CS305/503, Spring 2009 Self-Balancing Trees Michael Barnathan

Insertion Algorithm//Refer to Lecture 9 for the rotate functions.void insert(AVLTree root, AVLTree newtree) {

//This can only happen now if the user passes in an empty tree.if (root == null)

root = newtree; //Empty. Insert the root.else if (newtree.value < root.value) { //Go left if <.

if (root.left == null) //Found a place to insert.root.left = newtree;

elseinsert(root.left, newtree); //Keep traversing.

}else { //Go right if >=.

if (root.right == null)root.right = newtree; //Found a place to insert.

elseinsert(root.right, newtree); //Keep traversing.

}

updateBalance(root);}

void updateBalance(AVLTree root) {//Note that a balance factor of -1 guarantees a left child exists.if (root.balance < -1 && root.left.balance < 0) {

rotateRight(root); //Left-left case: rotate right once.root.right.balance = root.balance++;

}else if (root.balance < -1) {

rotateLeft(root.left); //Left-right case: rotate the left child left, rotate the root right.rotateRight(root);root.left.balance = -1 * Math.max(root.balance, 0);root.right.balance = -1 * Math.min(root.balance, 0);root.balance = 0;

}else if (root.balance > 1 && root.right.balance > 0) {

rotateLeft(root); //Right-right case.root.left.balance = root.balance--;

}else if (root.balance > 1) {

rotateRight(root.right); //Right-left case.rotateLeft(root);root.left.balance = -1 * Math.max(root.balance, 0);root.right.balance = -1 * Math.min(root.balance, 0);root.balance = 0;

}}

Page 11: CS305/503, Spring 2009 Self-Balancing Trees Michael Barnathan

Insertion Analysis

• We go down the tree to insert.• We go back up the tree and rotate.• AVL trees are always balanced, so what is the

complexity of this operation?

Page 12: CS305/503, Spring 2009 Self-Balancing Trees Michael Barnathan

CRUD: AVL Trees.• Insertion: O(log n).• Access: O(log n).• Updating an element: O(log n).• Deleting an element: O(log n).

• Search: O(log n).• Traversal: O(n).

• This is a winner.• We have all of the nice BST properties, without having to worry

about balance.• This does, however, require O(n) extra space to store the balance

factor for each node.

Page 13: CS305/503, Spring 2009 Self-Balancing Trees Michael Barnathan

B+ Trees: Motivation• Binary search trees are very useful data structures when data lives in

memory.• However, they are not good for disk access.

– Disk access is very slow compared to memory.– Traversing a BST is a mess on disk.

• If each node is stored somewhere on the disk, even a simple traversal requires a great deal of random access.

• Random access is difficult to cache.• Range queries in particular perform poorly.

– Nodes do not align to “blocks” on disk.• Disks can only read data one “block” at a time. If we need less than one block, we waste

time reading data that isn’t used.• A self-balancing tree called a B+ tree can solve these issues.• These are used in several popular filesystems, including NTFS, ReiserFS,

XFS, and JFS.• They are also used to index tables in database systems, such as MySQL.

Page 14: CS305/503, Spring 2009 Self-Balancing Trees Michael Barnathan

B+ Trees: Idea• Very different from what we’ve seen.• First, they grow UP, not DOWN.• They are not binary; each node contains an array of n

values and points to n+1 children.• Only leaves hold actual values; interior nodes hold the

maximum value in the corresponding leaf. This is used as a means of indexing.– Some variations use the minimum or middle.

• They are “threaded”:– Each leaf points to the next in sorted order.– This makes sequential access and range queries fast.

• If each variable occupies a bytes and your device’s block size is b, the optimal size of the array is b / a - 1.– One level of the tree would then fill one block.

Page 15: CS305/503, Spring 2009 Self-Balancing Trees Michael Barnathan

B+ Trees

14 20 26

7 14 20 22 26 28 34 97

Note that each value in an interior node is the maximum of its leaves’ values. The last pointer points to a child containing greater elements.

Advantage: you can tell which leaf to read using only the interior node (one disk read). The other leaves do not have to be read.

Page 16: CS305/503, Spring 2009 Self-Balancing Trees Michael Barnathan

B+ Tree Structure

• We define k as the order of a B+ tree.• Each node is allowed to store up to 2k values and

2k+1 pointers.• The structure looks like this:class BPTree {

static final int ORDER=2; //You choose this.int[] values = new int[ORDER*2];BPTree[] children = new BPTree[ORDER*2+1];

}

Page 17: CS305/503, Spring 2009 Self-Balancing Trees Michael Barnathan

Search

• Each node contains the maximum value of its keys. We can use this to locate the node to descend to when searching.

BPTree search(BPTree root, int val) {if (root == null)

return null;for (int childidx = root.values.length - 1; childidx >= 0; childidx--) {

if (root.children[0] == null && root.values[childidx] == val) //Found in a leaf.return root;

if (val > root.values[childidx])return search(root.children[childidx+1], val);

return search(root.children[0], val);}

Page 18: CS305/503, Spring 2009 Self-Balancing Trees Michael Barnathan

Search Example

14 20 26

7 14 20 22 26 28 34 97

Search for 17.

Childidx = 2. Val > 26? No.Childidx = 1. Val > 20? No.Childidx = 0. Val > 14? Yes. Traverse down 20 (child[childidx+1]).

20 is a leaf. Val = 20? No. Return null.

Page 19: CS305/503, Spring 2009 Self-Balancing Trees Michael Barnathan

Insertion: Split of a Leaf Node• If the node is not full, we can just locate the proper position in

the node’s array of keys to insert.• However, if the node is full, we need to split the node. This is

how the tree grows.• Insertion always begins at the leaves.• When a leaf splits, a new leaf is created, which becomes that

leaf’s successor.– The lower half of the old leaf’s values stay, while the upper half move

to the new leaf. The parent value for the old leaf becomes the new maximum of the values remaining in the leaf.

– The old leaf is then linked to the new leaf.• That means the leaf’s parent must point to this new node.

– So we insert the new leaf into the parent.– Ack, there’s a problem here!

Page 20: CS305/503, Spring 2009 Self-Balancing Trees Michael Barnathan

Split of an Internal Node• What if the parent is also full when we try to insert

the new leaf into it?• We then have to split the parent.• This is similar to a leaf-node split (cut the node in

half, move the maximum up), with one crucial difference:– When you move the old maximum to the parent, you

remove it from the current node.– Internal nodes don’t contain values, so this is OK.

• Now we’re inserting into this node’s parent.– And that means that node can split as well!

• When will the insanity end!?

Page 21: CS305/503, Spring 2009 Self-Balancing Trees Michael Barnathan

Root Split

• When you reach the root, of course.• When the root splits in two, a new root is

created pointing to the old root and its new sibling, which are now its children.

• This increases the height of the tree by 1.• So it is possible for one insertion to cascade

splits all the way up the tree.– What do you think the complexity is, then?

Page 22: CS305/503, Spring 2009 Self-Balancing Trees Michael Barnathan

B+ Tree Deletion

• As always, deletion is insertion in reverse.• As a rule, B+ tree nodes should always be at

least halfway full (that is why the order is half of the maximum number of nodes).

• If deletion causes a node to fall below this size, we will have to undo splits.

• But first, the easy case:– If the leaf we remove from is more than half full,

we simply remove it and we’re finished.

Page 23: CS305/503, Spring 2009 Self-Balancing Trees Michael Barnathan

“Borrowing” Values.

• If we fall below the halfway threshold but the next sibling of this leaf is above the threshold, just move the first value of the sibling into the last value of the current node.

• Since this is larger than anything currently in the node by definition, update the parent’s maximum pointer as well.

• Nodes can be borrowed from the previous sibling as well.– If neither can spare an element, there’s no choice but to

merge.

Page 24: CS305/503, Spring 2009 Self-Balancing Trees Michael Barnathan

Merging• The inverse of a split is called a merge or coalesce operation.• The inability to borrow ensured that this node and its sibling

are both half full.• Therefore, we can merge the two siblings into one node.• We then delete the pointer to the sibling from the parent

node (and from the linked list of siblings, of course).• This in turn can cause the parent to underflow…

– Fortunately, internal nodes and leaves are merged in the same way.• If the merge propagates to the root, the old root disappears

and the height of the tree drops by 1.

Page 25: CS305/503, Spring 2009 Self-Balancing Trees Michael Barnathan

Example• The algorithms for these operations are complex. I

haven’t decided whether we should discuss them in detail yet.

• First, make sure you understand the idea of what is going on.

• This will help:– http://people.ksp.sk/~kuko/bak/index.html– That applet demonstrates B-trees, not B+ trees, but I’ll

point out the differences.– http://www.seanster.com/BplusTree/BplusTree.html– This is a B+-tree implementation, but uses the middle

element rather than the maximum.

Page 26: CS305/503, Spring 2009 Self-Balancing Trees Michael Barnathan

CRUD: B+ Trees.• Insertion: O(log n).• Access: O(log n).• Update: O(log n).• Delete: O(log n).

• Search: O(log n).• Traversal: O(n).

• These are the same asymptotic performances as in AVL trees.• The primary advantage of the B+ tree over the AVL tree is in disk

performance and indexing.– There’s also no balance factor, but you waste more space with half-

empty arrays in the worst case.• You also have that nice linked list structure to traverse.

Page 27: CS305/503, Spring 2009 Self-Balancing Trees Michael Barnathan

Our Last Balancing Act

• We’ve devoted a lot of time to tree balance.• Next time, we’ll move on to heaps and

heapsort, and we’ll revisit priority queues.• The lesson:

– Automatic solutions save time with repeated use, but often carry a higher initial cost.