29
Variations on Balanced Trees Lazy Red-Black Trees Stefan Kahrs

Variations on Balanced Trees Lazy Red - Black Trees

  • Upload
    elsie

  • View
    36

  • Download
    1

Embed Size (px)

DESCRIPTION

Variations on Balanced Trees Lazy Red - Black Trees. Stefan Kahrs. Overview. some general introduction on BSTs some specific observations on red-black trees how we can make them lazy - and why we may want to conclusions. Binary Search Trees. - PowerPoint PPT Presentation

Citation preview

Page 1: Variations on Balanced Trees Lazy  Red - Black  Trees

Variations on Balanced TreesLazy Red-Black Trees

Stefan Kahrs

Page 2: Variations on Balanced Trees Lazy  Red - Black  Trees

Overview

• some general introduction on BSTs• some specific observations on red-black trees• how we can make them lazy - and why we

may want to• conclusions

Page 3: Variations on Balanced Trees Lazy  Red - Black  Trees

Binary Search Trees

• commonly used data structure to implement sets or finite maps (only keys shown):

56

33 227

Page 4: Variations on Balanced Trees Lazy  Red - Black  Trees

A problem with ordinary BSTs

• on random data searching or inserting or deleting an entry performs in O(log(n)) time where n is the number of entries, but...

• if the data is biased then this can deteriorate to O(n)

• ...and thus a tree-formation can deteriorate to O(n2)

Page 5: Variations on Balanced Trees Lazy  Red - Black  Trees

Therefore...

• people have come up with various schemes that make trees self-balance

• the idea is always that insertion/deletion pay a O(log(n)) tax to maintain an invariant

• the invariant guarantees that search or insert or delete all perform in logarithmic time

Page 6: Variations on Balanced Trees Lazy  Red - Black  Trees

Well-known invariants for trees

• Braun trees: size of left/right subtree vary by at most 1 – too strong for search trees O(n0.58)

• AVL trees: depth of left/right subtree vary by at most 1

• 2-3-4 trees: a node has 1 to 3 keys, and 2 to 4 subtrees (special case of B-tree)

• Red-Black trees: an indirect realisation of 2-3-4 trees

Page 7: Variations on Balanced Trees Lazy  Red - Black  Trees

Red-Black Tree

• BST with an additional colour field which can be RED or BLACK

• invariant 1: red nodes have only black children, root/nil are black

• thus, a non-empty black node has between 2 and 4 black children

• invariant 2: all paths to leaves go through the same number of black nodes

Page 8: Variations on Balanced Trees Lazy  Red - Black  Trees

Example68

12 83

7 43 75 96

989470 76

Page 9: Variations on Balanced Trees Lazy  Red - Black  Trees

Perceived Wisdom

• Red-Black trees are cheaper to maintain than AVL trees, though they may not be quite as balanced

• pretty balanced though: average path-length for a Red-Black tree is in the worst case only 5% longer that that of a Braun-tree

Page 10: Variations on Balanced Trees Lazy  Red - Black  Trees

Aside: a problem with balanced trees

• an ordinary BST has on random data an average path length of 2*ln(n)

• this is only 38% longer than the average path length of a Braun tree

• thus: most balanced tree schemes lose against ordinary BST on random data, because they fail to pay their tax from those 38%

• red-black trees succeed though

Page 11: Variations on Balanced Trees Lazy  Red - Black  Trees

Algorithms on RB trees

• search: unchanged, ignores colour• insert:– insert as in BST (a fresh red node)– rotate subtrees until color violation goes away– colour root black

• delete (more complex than insert):– delete as in BST– if underflow rotate from siblings until underflow

goes away

Page 12: Variations on Balanced Trees Lazy  Red - Black  Trees

Example68

12 83

7 43 75 96

989470 76

69

Page 13: Variations on Balanced Trees Lazy  Red - Black  Trees

Example68

12 83

7 43 75 96

989470 76

69

Page 14: Variations on Balanced Trees Lazy  Red - Black  Trees

Example

68

12

83

7 43

75

96

989470 76

69

Page 15: Variations on Balanced Trees Lazy  Red - Black  Trees

Standard Imperative Algorithm

• find the place of insertion in a loop• check your parent whether you’re a naughty

child, and correct behaviour if necessary, by going up the tree

Page 16: Variations on Balanced Trees Lazy  Red - Black  Trees

Problem with this

Question: how do you go up the tree?Answer: children should know their parent.

Which means: trees in imperative implementations are often not proper trees, every link consists of two pointers

Page 17: Variations on Balanced Trees Lazy  Red - Black  Trees

Functional Implementations

• in a pure FP language such as Haskell you don’t have pointer comparison and so parent pointers won’t work

• instead we do something like this:insert x tree = repair (simplInsert x tree)

• simplInsert inserts data in subtree and produces a tree with a potential invariant violation at the top, repair fixes that

• the ancestors sit on the recursion stack

Page 18: Variations on Balanced Trees Lazy  Red - Black  Trees

Recursion

• actually, nothing stops us from doing likewise in an imperative language, using recursive insertion (or deletion)

• cost: recursive calls rather than loops• benefit: no parent pointers – saves memory

and makes all rotations cheaper• is still more expensive though...

Page 19: Variations on Balanced Trees Lazy  Red - Black  Trees

Can we do better?

• problem is that the recursive insertion algorithm is not tail-recursive and thus not directly loopifiable: we repair after we insert

• what if we turn this around?newinsert x tree = simplinsert x (repair tree)• this is the fundamental idea behind lazy red-

black trees

Page 20: Variations on Balanced Trees Lazy  Red - Black  Trees

What does that mean?

• we allow colour violations to happen in the first place

• these violations remain in the tree• we repair them when we are about to revisit a

node• this is all nicely loopifiable and requires no

parent pointers

Page 21: Variations on Balanced Trees Lazy  Red - Black  Trees

In the imperative code

• where we used to have...n = n.left;...to continue in the left branch• we now have:n = n.left = n.left.repair();

Page 22: Variations on Balanced Trees Lazy  Red - Black  Trees

Invariants?

• the standard red-black tree invariant is broken with this (affects search)

• in principle, we can have B-R-R-B-R-R-B-R-R paths, though these are rare

• but this is as bad as it gets, so we do have an invariant that guarantees O(log(n))

• average path lengths are similar to RB trees

Page 23: Variations on Balanced Trees Lazy  Red - Black  Trees

Performance?

• I implemented this in Java, and the performance data were initially inconclusive (JIT compiler, garbage collection)

• after forcing gc between tests, standard RB remains faster (40% faster on random inputs), though this may still be tweakable

• so what is the extra cost, and can we do anything about it?

Page 24: Variations on Balanced Trees Lazy  Red - Black  Trees

Checks!

• most nodes we visit and check are fine• especially high up in the tree, as these are

constantly repaired• ...and the ones low down do not matter that

much anyway• so we could move from regular maintenance

to student-flat maintenance, i.e. repair trees only once in a blue moon

Page 25: Variations on Balanced Trees Lazy  Red - Black  Trees

What?

• yes, the colour invariant goes to pot with that• we do maintain black height though...• ...and trust the healing powers of occasional

repair: suppose we have a biased insertion sequence and don’t repair for a while...

Page 26: Variations on Balanced Trees Lazy  Red - Black  Trees

Example

12

83

7

43

96

70

suppose the tree has this shape, and now we insert a 5 in repair-mode

Page 27: Variations on Balanced Trees Lazy  Red - Black  Trees

Result

12

83

7

43 96

70

5

Page 28: Variations on Balanced Trees Lazy  Red - Black  Trees

Findings

• on random data, performance of lazy red-black trees is virtually unaffected, even if we perform safe-insert only 1/100

• on biased data works a bit better under student-flat, but still loses to RB (15% slower for this bias)

• average tree depth: 1.5 longer than RB– on random inputs– also on biased inputs (where BST falls off the cliff)

Page 29: Variations on Balanced Trees Lazy  Red - Black  Trees

Conclusions

• Ultimately: failure!• Lazy RB trees are not faster than normal ones.• On random inputs, Lazy RB perform very

similarly to plain BST• Some small room for improvement – I doubt

though the gap to plain RB can be closed• Perhaps other algorithms would benefit more

from lazy invariant maintenance?