24
B-Tree

B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number

Embed Size (px)

Citation preview

Page 1: B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number

B-Tree

Page 2: B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number

B-Trees

a specialized multi-way tree designed especially for use on disk

In a B-tree each node may contain a large number of keys. The number of subtrees of each node, then, may also be large

A B-tree is designed to branch out in this large number of directions and to contain a lot of keys in each node so that the height of the tree is relatively small

Page 3: B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number

Definitions A B-tree of order m (the maximum number of children for each

node) is a tree which satisfies the following properties:

1. Every node has at most m children.

2. Every node (except root and leaves) has at least ceil(m⁄2) children.

3. The root has at least two children if it is not a leaf node.

4. All leaves appear in the same level, and carry information.

5. A non-leaf node with k children contains k–1 key

6. Each leaf node (other than the root node if it is a leaf) must contain at least ceil(m / 2) - 1 keys

7. Keys and subtrees are arranged in the fashion of search tree

Page 4: B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number

Example--A B-tree of order 5

Page 5: B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number

B-Tree -- Search

Search is performed in the typical manner, analogous to that in a binary search tree. Starting at the root, the tree is traversed top to bottom, choosing the child pointer whose separation values are on either side of the value that is being searched.

Binary search is typically (but not necessarily) used within nodes to find the separation values and child tree of interest.

Page 6: B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number

B-Tree Insertion

When inserting an item, first do a search for it in the B-tree. If the item is not already in the B-tree, this unsuccessful search will end at a leaf.

If there is room in this leaf, just insert the new item here. Note that this may require that some existing keys be moved one to the right to make room for the new item.

If instead this leaf node is full so that there is no room to add the new item, then the node must be "split" with about half of the keys going into a new node to the right of this one. The median (middle) key is moved up into the parent node. (Of course, if that node has no room, then it may have to be split as well.) Note that when adding to an internal node, not only might we have to move some keys one position to the right, but the associated pointers have to be moved right as well.

If the root node is ever split, the median key moves up into a new root node, thus causing the tree to increase in height by one.

Page 7: B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number

Insertion Example

Insert the following letters into what is originally an empty B-tree of order 5: C N G A H E K Q M F W L T Z D P R X Y S

Order 5 means that a node can have a maximum of 5 children and 4 keys. All nodes other than the root must have a minimum of 2 keys.

Page 8: B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number

Insertion Example -- continued

The first 4 letters get inserted into the same node, resulting in this picture:

Insert H (no room in above node, split it into 2 nodes, move median G up into a new root node)

Page 9: B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number

Insertion Example -- continuedInsert E, K, and Q

Insert M (split the node, M is median, move up)

Page 10: B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number

Insertion Example -- continuedInsert F, W, L and T

Insert Z (Split, move median T up)

Page 11: B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number

Insertion Example -- continuedInsert D (Split, move median D up), then insert P, R,

X, Y

Insert S (Split, move median Q up, Split, move median M up)

Page 12: B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number

B-Tree Deletion

locate and delete the item, then restructure the tree to regain its invariants

There are two special cases to consider when deleting an element:

1. the element in an internal node may be a separator for its child nodes

2. deleting an element may put it under the minimum number of elements and children

Page 13: B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number

B-Tree Deletion

Search for the value to delete

If the value is in an internal node, choose a new separator (either the largest element in the left subtree or the smallest element in the right subtree), remove it from the leaf node it is in, and replace the element to be deleted with the new separator (for the leaf node with an element deleted, same as case below)

If the value is in a leaf node, it can simply be deleted from the node, perhaps leaving the node with too few elements; so some additional changes to the tree will be required

Page 14: B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number

B-Tree DeletionAdditional changes -- Rebalancing after deletion

If the right sibling has more than the minimum number of elements Borrow one, adjust the separator

Otherwise, if the left sibling has more than the minimum number of elements Borrow one, adjust the separator

If both immediate siblings have only the minimum number of elements * Create a new node with all the elements from the deficient node,

all the elements from one of its siblings, and the separator in the parent between the two combined sibling nodes.

* Remove the separator from the parent, and replace the two children it separated with the combined node.

* If that brings the number of elements in the parent under the minimum, repeat these steps with that deficient node, unless it is the root, since the root may be deficient.

Page 15: B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number

B-Tree Deletion Example

Delete H

Page 16: B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number

Deletion Example -- Continued

Delete T (internal node, select the smallest element from the right subtree to replace T)

Page 17: B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number

Deletion Example -- Continued

Delete R (leaf node, need rebalance after the deletion:Borrow a key from right sibling, adjust separator:

move W down, combine with S, move X up to the parent

Page 18: B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number

Deletion Example -- Continued

Delete E (leaf node, need rebalance after deletion)Left and right sibling has only minimum keys, Create a new node: combine with left sibling, the

separator from the parent, and the deficient node

Page 19: B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number

Deletion Example -- Continued

Continue rebalanceThe sibling has only minimum keysCreate a new node: combine the deficient node with

the separator from the parent, and the right sibling

Page 20: B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number

2-3 B-Trees or simply referred as 2-3 tree

Properties

• trinary tree - 3 or fewer children per node• each node is either a 2-node or 3-node (subtree count) • 2-nodes contain 1 value and 3-nodes contain 2 sorted • BST property holds for node content & left, mid, right subtrees • all leaves have same level

Page 21: B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number

B-Tree

A B-tree is kept balanced by requiring that all external nodes are at the same depth. This depth will increase slowly as elements are added to the tree, but an increase in the overall depth is infrequent, and results in all leaf nodes being one more node further away from the root.

B-trees have substantial advantages over alternative implementations when node access times far exceed access times within nodes. This usually occurs when most nodes are in secondary storage such as hard drives. By maximizing the number of child nodes within each internal node, the height of the tree decreases, balancing occurs less often, and efficiency increases. Usually this value is set such that each node takes up a full disk block or an analogous size in secondary storage.

2-3 B-trees: useful in main memory

Page 22: B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number

2-3 Tree Implementationpublic class TwoThreeTree<Content> { private boolean is2node; private Content smallContent; private Content bigContent;

private TwoThreeTree<Content> left; private TwoThreeTree<Content> mid; private TwoThreeTree<Content> right;

private TwoThreeTree<Content> parent; ...}

Page 23: B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number

B+-Tree

How do these modifications change the performance of ...a search?

Ways to improve a B-tree

• keep all values in the leaves • form a linked list of leaf nodes

...an insertion or removal?

Page 24: B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number

B+ TreeThe B+ tree is a variant of the B-tree, all records

are stored at the leaf level of the tree; only keys are stored in interior nodes.

B-tree can store both keys and records in its interior nodes; in this sense, the B+ tree is a specialization of the B-tree.