Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
1
Tree-Structured Indexes
Yanlei DiaoUMass Amherst
Feb 28, 2007
Slides Courtesy of R. Ramakrishnan and J. Gehrke
2
B+ Tree: Most Widely Used IndexHeight-balanced given arbitrary inserts/deletes.
Fanout: # child pointers of a non-leaf node F = avg. fanout
Height: N = # leaf pagesH = Log F N(Root: level 0, …, Leaf: level H)
Index Entries
Data Entries("Sequence set")
(Direct search)
3
B+ Tree: Most Widely Used IndexMinimum 50% occupancy (except for the root).
Order of the tree (n): max # of keys in a node.• Can be computed using the node size, key size, pointer size.
Each non-root node contains [ ⎡n/2⎤, n ] entries, i.e., at least half full. Root node can have [1, n] entries.
Index Entries
Data Entries("Sequence set")
(Direct search)
4
B+ Trees in PracticeTypical order is 200. Typical fill-factor (occupancy)is 67%.
Average fanout = 133Level 0=1 page; Level 1=133 pages; Level 2=1332 pages...
Typical capacities:Height 3: 1333 = 2,352,637 recordsHeight 4: 1334 = 312,900,700 records
5
Searches in a B+ Tree
Search begins at root, and key comparisons direct it to a leaf.Search for 5*, 15*, all data entries >= 24* ...
Root
17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
13
<13≥13 <17
≥17 <24 ≥24 <30≥30
6
Inserting a Data Entry into a B+ TreeFind correct leaf L via a top-down search. Put data entry onto L.
If L has enough space, done!Else, must split L (into L and a new node L2)
• Redistribute entries evenly, copy up middle key k, insert (k, pointer to L2) into parent of L.
Splitting can happen recursively to non-leaf nodes• Redistribute entries evenly, but push up middle key.
(Contrast with leaf splits.)
Splits “grow” the tree! First wider, then one level taller when the root splits.
7
Previous Example
Root
17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
13
Inserting 8*
8
Inserting 8* into Example B+ Tree
Minimum occupancy is guaranteed in node splits.
2* 3* 5* 7* 8*
5Entry to be inserted in parent node.(Note that 5 iscontinues to appear in the leaf.)
copied up and
Copy up: key value of an inserted entry must appear in a leaf node!
9
Inserting 8* into Example B+ Tree
Note difference between copy-up and push-up. Reasons?
appears once in the index. Contrast
5 24 30
17
13
Entry to be inserted in parent node.(Note that 17 is pushed up and only
this with a leaf split.)
Push up: Any key value can appear at most once in non-leaf nodes of the tree!
2* 3* 5* 7* 8*
5
13 17 24 30
10
Example B+ Tree After Inserting 8*
Root was split, leading to increase in height!
2* 3*
Root
17
24 30
14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
135
7*5* 8*
11
Deleting a Data Entry from a B+ Tree
Start at root, find leaf L where entry belongs.Remove the entry.
If L is at least half-full, done! If L has only ⎡n/2⎤ - 1 entries,
• Try to re-distribute, borrowing from sibling (adjacent node with same parent as L).
• If re-distribution fails, merge L and sibling. Must delete index entry (pointing to L or sibling) from parent of L.
Merge could propagate to root, decreasing height.
12
Current B+ Tree
2* 3*
Root
17
24 30
14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
135
7*5* 8*
Delete 19*
Delete 20*
13
Example Tree After Deleting 19* and 20* ...
Deleting 19* is easy.Deleting 20* is done with re-distribution. Notice how middle key is copied up.
2* 3*
Root
17
30
14* 16* 33* 34* 38* 39*
135
7*5* 8* 22* 24*
27
27* 29*
14
New B+ Tree ...
2* 3*
Root
17
30
14* 16* 33* 34* 38* 39*
135
7*5* 8* 22* 24*
27
27* 29*
Delete 24*
×
××
××
15
... And Then Deleting 24*
Must merge nodes.Toss index entry (right) Pull down of index entry (below).
30
22* 27* 29* 33* 34* 38* 39*
2* 3* 7* 14* 16* 22* 27* 29* 33* 34* 38* 39*5* 8*
Root30135 17
16
Example of Non-leaf Re-distribution
Tree is shown below during deletion of 24*. (What could be a possible initial tree?)In contrast to previous example, can re-distribute entry from left child of root to right child.
Root
135 17 20
22
30
14* 16* 17* 18* 20* 33* 34* 38* 39*22* 27* 29*21*7*5* 8*3*2*
17
After Re-distribution
Intuitively, entries are re-distributed by `pushingthrough’ the splitting entry in the parent node.It suffices to re-distribute index entry with key 20; we’ve re-distributed 17 as well for illustration.
14* 16* 33* 34* 38* 39*22* 27* 29*17* 18* 20* 21*7*5* 8*2* 3*
Root
135
17
3020 22
18
Summary
Tree-structured indexes are ideal for range-searches, also good for equality searches.B+ tree is a dynamic structure.
Inserts/deletes leave tree height-balanced; log F N cost.High fanout (F) means depth rarely more than 3 or 4.Almost always better than maintaining a sorted file.Typically, 67% occupancy on average.Usually preferable to ISAM, modulo lockingconsiderations; adjusts to growth gracefully.If data entries are data records, splits can change rids!