29
Binary Search Trees CMSC 420: Lecture 6

Binary Search Treesckingsf/bioinfo-lectures/bst.pdf · Binary Search Trees • BST Property: If a node has key k then keys in the left subtree are < k and keys in the right subtree

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Binary Search Treesckingsf/bioinfo-lectures/bst.pdf · Binary Search Trees • BST Property: If a node has key k then keys in the left subtree are < k and keys in the right subtree

Binary Search TreesCMSC 420: Lecture 6

Page 2: Binary Search Treesckingsf/bioinfo-lectures/bst.pdf · Binary Search Trees • BST Property: If a node has key k then keys in the left subtree are < k and keys in the right subtree

Binary Tree Traversals

H I J

D E F

B

G

C

A

inorder: HDIBEAJFCGpreorder: ABEHIECFJGpostorder: HIDEBJFGCA

void traverse(BinNode *T) { if(T != NULL) { PREORDER(T); traverse(T->left()); INORDER(T); traverse(T->right()); POSTORDER(T); }}

How much space is used?

Page 3: Binary Search Treesckingsf/bioinfo-lectures/bst.pdf · Binary Search Trees • BST Property: If a node has key k then keys in the left subtree are < k and keys in the right subtree

Threaded Trees

• Traversals:

- Require extra memory, and

- Must be started from the root

• Use NULL pointers to store in-order predecessors and successors.

• Extra bit associated with each pointer marks whether it is a thread.

EG

D

E

B

A

H

G

Page 4: Binary Search Treesckingsf/bioinfo-lectures/bst.pdf · Binary Search Trees • BST Property: If a node has key k then keys in the left subtree are < k and keys in the right subtree

Threaded Trees

EG

D

E

B

A

H

G

void inorder_succ(BinNode *T) { BinNode * next = T->right();

if(!next) return NULL; if(!is_thread(T->right)) { while(next->left() && is_thread(next->left) ) next = next->left(); }

return next;}

r1 r2

r

In general, in order successor = leftmost item in right subtree

Page 5: Binary Search Treesckingsf/bioinfo-lectures/bst.pdf · Binary Search Trees • BST Property: If a node has key k then keys in the left subtree are < k and keys in the right subtree

Using Threads for Preorder

EG

D

E

B

A

H

G

void preorder_succ(BinNode *T) { if(T->left() && !is_thread(T->left) return T->left();

for(BinNode* next = T->right(); is_thread(next->right); next = next->right()) {}

return RC(P);}

preorder_succ(H) = right child of the lowest ancestor of H that both has H in its left subtree and has a right child.

Walk up right threads

Return right child

Page 6: Binary Search Treesckingsf/bioinfo-lectures/bst.pdf · Binary Search Trees • BST Property: If a node has key k then keys in the left subtree are < k and keys in the right subtree

Serializing Trees

• Often want to write trees out to disk in a space efficient way.

• Preorder traversal will let you store the nodes.

- What’s the preorder traversal of this tree?

• Need to encode the structure somehow.

E H

D

F

B

A

G

C

Page 7: Binary Search Treesckingsf/bioinfo-lectures/bst.pdf · Binary Search Trees • BST Property: If a node has key k then keys in the left subtree are < k and keys in the right subtree

Serializing Trees

• In preorder traversal, output a mark when you finish processing a node’s children.

• Leaves = empty children lists: ∅

• ∅ symbols are redundant:

- ABE))C)D)F)G)H))

• “)” means “go up one level”.

E H

D

F

B

A

G

∅ ∅ ∅

A B E ∅ ) ) D F ∅ ) G ∅ ) H ∅ ) ) C ∅ ) )

C

b d ae f g h c

Page 8: Binary Search Treesckingsf/bioinfo-lectures/bst.pdf · Binary Search Trees • BST Property: If a node has key k then keys in the left subtree are < k and keys in the right subtree

Binary Search Trees

• BST Property: If a node has key k then keys in the left subtree are < k and keys in the right subtree are > k.

• We disallow duplicate keys.

• Generalization of the binary search process we saw before:

- ordering

- partitioning

- linking

• Good for implementing the dictionary ADT we’ve already seen: insert, delete, find.

2 11

8

6

3

5

9

4

Page 9: Binary Search Treesckingsf/bioinfo-lectures/bst.pdf · Binary Search Trees • BST Property: If a node has key k then keys in the left subtree are < k and keys in the right subtree

Sorted Set Problem

• If keys are totally ordered, dictionary ADT is sometimes extended to a “sorted set ADT.”

- Totally ordered means: for every pair (a,b) either a < b or b < a).

- Operations:• s = make_sorted_set

• find(s, k)

• insert(s, k)

• delete(s, k)

• join(s1, k, s2): make a new sorted set from s1, {k}, s2; destroy s1 and s2. Assumes every item in s1 < k and k < every time in s1.

• split(s, k): return 3 new sorted sets: s1 with items < k, {k}, and s2 with items > k.

Dictionary operations

Page 10: Binary Search Treesckingsf/bioinfo-lectures/bst.pdf · Binary Search Trees • BST Property: If a node has key k then keys in the left subtree are < k and keys in the right subtree

BST Find

2 11

8

6

3

5

9

4

Find k = 6:

Is k < 8?

Is k < 5? No, go right

Yes, go left

Page 11: Binary Search Treesckingsf/bioinfo-lectures/bst.pdf · Binary Search Trees • BST Property: If a node has key k then keys in the left subtree are < k and keys in the right subtree

BST Find

2 11

8

6

3

5

9

4

Find k = 9:

Is k < 8?

Is k < 5? No, go right

Yes, go left

No, go right

Is k < 11?

Page 12: Binary Search Treesckingsf/bioinfo-lectures/bst.pdf · Binary Search Trees • BST Property: If a node has key k then keys in the left subtree are < k and keys in the right subtree

BST Find

2 11

8

6

3

5

9

4

Find k = 13:

Is k < 8?

Is k < 5? No, go right

No, go right

Is k < 11?

NULL

No, go right

Page 13: Binary Search Treesckingsf/bioinfo-lectures/bst.pdf · Binary Search Trees • BST Property: If a node has key k then keys in the left subtree are < k and keys in the right subtree

insert(T, K): q = NULL p = T while p != NULL and p.key != K: q = p if p.key < K: p = p.left else if p.key > K: p = p.right

if p != NULL: error DUPLICATE

N = new Node(K) if q.data > K: q.left = N else: q.right = N

BST Insert

11

8

6

3

5

9

4

NULL

2

q

pq

pq

p

Same idea as BST Find

Page 14: Binary Search Treesckingsf/bioinfo-lectures/bst.pdf · Binary Search Trees • BST Property: If a node has key k then keys in the left subtree are < k and keys in the right subtree

insert(T, K): if T == NULL: T = new_node(K) T.left = new_external_node() T.right = new_external_node() else p = T while p.key != K and p.left != NULL: if p.key < K: p = p.left else if p.key > K: p = p.right

if p.left != NULL: error DUPLICATE

p.key = K p.left = new_external_node() p.right = new_external_node()

BST Insert with Extended Binary Trees

11

8

6

3

5

9

42

Page 15: Binary Search Treesckingsf/bioinfo-lectures/bst.pdf · Binary Search Trees • BST Property: If a node has key k then keys in the left subtree are < k and keys in the right subtree

1

BST FindMin

2 11

8

6

3

5

4

Walk left until you can’t go left any more

Can you express inorder_successor using find_min?

111

8

6

4

5

3

Page 16: Binary Search Treesckingsf/bioinfo-lectures/bst.pdf · Binary Search Trees • BST Property: If a node has key k then keys in the left subtree are < k and keys in the right subtree

BST Delete

2

3

5

6

2

3

5

4 2

3

5

Node is leaf: Node has 1 child:

2

3

5 5

2

4

2

3

5

6

4

Node has 2 children:

Page 17: Binary Search Treesckingsf/bioinfo-lectures/bst.pdf · Binary Search Trees • BST Property: If a node has key k then keys in the left subtree are < k and keys in the right subtree

Python Implementation of BST

Page 18: Binary Search Treesckingsf/bioinfo-lectures/bst.pdf · Binary Search Trees • BST Property: If a node has key k then keys in the left subtree are < k and keys in the right subtree

How would you implement join and split?

Page 19: Binary Search Treesckingsf/bioinfo-lectures/bst.pdf · Binary Search Trees • BST Property: If a node has key k then keys in the left subtree are < k and keys in the right subtree

Split(G)

G

B

A

Cab

cD

dE

eF

fg h g

hs1 s2

Gi=

F

f

join(s2,F,f)

E

e

join(s2, E, e)

Cjoin(s2, C, c)

c

A

a

join(a, A, s1)

B

b

join(b, B, s1)

D

d

join(d, D, s1)

Letters represent labels not keys

Page 20: Binary Search Treesckingsf/bioinfo-lectures/bst.pdf · Binary Search Trees • BST Property: If a node has key k then keys in the left subtree are < k and keys in the right subtree

• What’s the worst possible insertion order?

• What’s the best possible insertion order?

Page 21: Binary Search Treesckingsf/bioinfo-lectures/bst.pdf · Binary Search Trees • BST Property: If a node has key k then keys in the left subtree are < k and keys in the right subtree

Expected Path Length of Random BST

• Suppose the keys k1, k2, k3, ..., kn are inserted in a random order (every permutation equally likely).

• What is the expected path length to a node in the BST built by inserting these keys?

• Idea: consider the leftmost path as a representative path.

• New key ki added to left-most path when it is the smallest encountered so far (ki < kj for j < i).

• In a random permutation, how often does the minimum change? [This is the length of the leftmost path]

Page 22: Binary Search Treesckingsf/bioinfo-lectures/bst.pdf · Binary Search Trees • BST Property: If a node has key k then keys in the left subtree are < k and keys in the right subtree

Expected Path Length of Random BST

• What’s the probability that ki is the smallest so far?

• Pr[ki is smallest among k1,...ki] = 1/i

• Why?

• In a random permutation of k1,...ki, the minimum is equally likely to be in any one of the i positions.

• Probability it is in the last position = 1/i.

k1, k2, k3, k4, k5, k6, k7, k8

Page 23: Binary Search Treesckingsf/bioinfo-lectures/bst.pdf · Binary Search Trees • BST Property: If a node has key k then keys in the left subtree are < k and keys in the right subtree

Expected Path Length of Random BST

• sum of xi = length of leftmost path.

• Expected length = E[∑xi] = ∑E[xi] = ∑[(1/i)1 + 0(1-1/i)] = ∑(1/i) = Hn = O(log n)

k1, k2, k3, k4, k5, k6, k7, k8

x1, x2, x3, x4, x5, x6, x7, x8

xi = 1 if ki is smallest among k1,...,ki

0 otherwise

keys =

random variables =

Page 24: Binary Search Treesckingsf/bioinfo-lectures/bst.pdf · Binary Search Trees • BST Property: If a node has key k then keys in the left subtree are < k and keys in the right subtree

Expected Path Length of Random BST

(Dave Mount’s Figure)

Page 25: Binary Search Treesckingsf/bioinfo-lectures/bst.pdf · Binary Search Trees • BST Property: If a node has key k then keys in the left subtree are < k and keys in the right subtree

Optimal Static BSTs – Cost of trees

• Define the cost of a tree built on keys kj,...,km:

• T is optimal if C(T) is smallest among any possible T containing the same keys.

• C(T) = expected cost of searching for a key in T.

C(T) = ∑ pi(Depth(T, ki) + 1)i = j

m

k1, k2, k3, k4, k5, k6, k7, k8 p1, p2, p3, p4, p5, p6, p7, p8

Why is it Depth + 1?

Page 26: Binary Search Treesckingsf/bioinfo-lectures/bst.pdf · Binary Search Trees • BST Property: If a node has key k then keys in the left subtree are < k and keys in the right subtree

Subtrees of optimal tree are optimal trees• Goal: find tree that minimizes C(T).

• Claim: every subtree of optimal tree is optimal.

• Proof: Let T be an optimal tree on kj,...,km, with root = kr (j ≤ r ≤ m)

C(T) = pr + ∑pi(Depth(T, ki) + 1) + ∑pi(Depth(T, ki) + 1)

C(Tleft) C(Tright)

i=j

r-1

i=r+1

m

∑pi + ∑piDepth(T, ki) i=j

r-1

i=j

r-1∑pi + ∑piDepth(T, ki) i=j

r-1

i=j

r-1

C(T) = ∑pi + C(Tleft) + C(Tright)i=j

m

Page 27: Binary Search Treesckingsf/bioinfo-lectures/bst.pdf · Binary Search Trees • BST Property: If a node has key k then keys in the left subtree are < k and keys in the right subtree

So,

• If there were a lower cost Tleft or Tright we could reduce the total cost of T, contradicting that T is optimal.

• Hence, Tleft and Tright must be optimal.

C(T) = ∑pi + C(Tleft) + C(Tright)i=j

m

Page 28: Binary Search Treesckingsf/bioinfo-lectures/bst.pdf · Binary Search Trees • BST Property: If a node has key k then keys in the left subtree are < k and keys in the right subtree

Dynamic Programming to Find OPT Tree

k2, k3, k4, k5, k6, k7, k8

C[j, m] =

0 if m < j (tree is empty)pj if j = m (tree is single node)

∑pi + min { C[j, r-1] + C[r+1, m]} if j < mri=j

m

j mr

So: if you fill in the C[j, m] table from in order of increasing m-j, you’ll always have the value of C[j,m] computed when you need it.

Dynamic Programming

Page 29: Binary Search Treesckingsf/bioinfo-lectures/bst.pdf · Binary Search Trees • BST Property: If a node has key k then keys in the left subtree are < k and keys in the right subtree

The chosen values of r partition the nodes and give you the optimal tree structure.