56
Copyright © 2003-2006 Curt Hill B Trees Disk based tree index

Copyright © 2003-2006 Curt Hill B Trees Disk based tree index

Embed Size (px)

Citation preview

Copyright © 2003-2006 Curt Hill

B TreesDisk based tree

index

Copyright © 2003-2006 Curt Hill

Introduction

• A BTree is a multiway, tree that usually resides on disk– Most CS tree are binary and in

memory

• The basis of many ISAM or VSAM implementations as well as DB clustered and unclustered indices

• B is for Bayer the person who worked out the original scheme– Not binary!

Copyright © 2003-2006 Curt Hill

History• Bayer and McCreight published in

1972• IBM made the BTree the basis of

its Virtual Storage Access Method shortly there after– Enlarged to replace Indexed

Sequential Access Method later

• Since then this has been the ISAM of choice

Copyright © 2003-2006 Curt Hill

Older versions of ISAM• Before BTrees ISAM files existed

– Allowed both random and sequential access

• Had some problems– Could not grow gracefully

• Index tree was fixed

– Had overflow areas– When performance degraded it had to

be rebuilt

Copyright © 2003-2006 Curt Hill

Older ISAM Problems

35 38 40

20

3015

2 7 8 9 17 18 20

ISAM file with emptyand overflow block

12 14 15

Copyright © 2003-2006 Curt Hill

Building ISAM Files• Build entire sequential sequence• Create each page with some free space• Build the index over the sequential file• The index once built is never modified

– Deleting a key removes it from leaf, but it cannot be removed from the index

– Too many inserts causes use of an overflow area

• Must be rebuilt from scratch– Unavailable during rebuild– The build process could be lengthy

Copyright © 2003-2006 Curt Hill

Characteristics:• Always balanced• Leaves are at same level• Do not waste space, all nodes are

at least half full• Insertion and deletion do not cause

the rewriting of the large parts of tree

• Concurrency is well supported with minimal locking of pages

Copyright © 2003-2006 Curt Hill

Variations• The standard BTree will be

discussed first– Seldom actually used

• The B+Tree and the B*Tree are the common variants– The modifications for these will also

be discussed

Copyright © 2003-2006 Curt Hill

Terminology• The three terms that need to be

understood are pointers, records and pages

• Pointer is a page identifier– The address of a page in any form– Usually compact

• The record is the item to be stored– Always includes key and data– Key may have any form

• The page is the block or page of the file system– Must be larger than the record

Copyright © 2003-2006 Curt Hill

BTree Node• BTrees are a set of nodes of three

types• The root node is the beginning of

the tree– The rest of the tree are descendents

of the root• A leaf node has no descendents• Interior nodes have both ancestors

and descendents

Copyright © 2003-2006 Curt Hill

A tree

CD KL PB TS

KQ MA MZ NK

EK GK

AQ BJ CA

PF PP QA ST

TZ WW

Copyright © 2003-2006 Curt Hill

Nodes• Always one more pointer than record• Always has between N and 2N

records– Except root

• N is chosen based upon– Page size– How many records and pointer fit

• The previous tree is a 4-5 tree– 4 records– 5 pointers– N = 2

Copyright © 2003-2006 Curt Hill

Examples and Reality• This presentation will show trees

with small Ns: 1 or 2– These diagram nicely in PowerPoint

• Real trees have large Ns 50 – 100• The N determines fan-out

– High fan-out is good– If fan out is 2 then 50% of tree is

eliminated from a search at each level– If fan out is 100 then 99% of tree is

eliminated from a search at each level– High fan out makes a flat tree

Copyright © 2003-2006 Curt Hill

Example numbers• Suppose that a BTree has average

fan-out of 50• Suppose that BTree has 1 million

entries• 1 disk access gets root • 3 disk accesses later the obtains

leaf• A sequential search requires an

average of 10000 disk accesses• Even a binary search requires 20

disk accesses

Copyright © 2003-2006 Curt Hill

Disk and Memory• Although the BTree will require

fewer disk accesses it will require more comparisons

• In previous example the BTree will do 75 comparisons while the binary tree 20

• The important delay is disk access speed

• In the delay of one disk page retrieval thousands or millions of comparisons could be done

Copyright © 2003-2006 Curt Hill

Searching a BTree1. Start at root 2. Is this key in this node?3. Yes – stop you are done4. No - Is this a leaf?5. Yes – this key does not exist - Stop6. No – find the pointer that is between

the two surrounding values7. Fetch this node – Go to step 2.

Copyright © 2003-2006 Curt Hill

Look for MZ

CD KL PB TS

KQ MA MZ NK

EK GK

AQ BJ CA

PF PP QA ST

TZ WW

Look for MZ

Copyright © 2003-2006 Curt Hill

Search root

CD KL PB TS

KQ MA MZ NK

EK GK

AQ BJ CA

PF PP QA ST

TZ WW

Look for MZ

Copyright © 2003-2006 Curt Hill

Not in root

CD KL PB TS

KQ MA MZ NK

EK GK

AQ BJ CA

PF PP QA ST

TZ WW

Look for MZ

Copyright © 2003-2006 Curt Hill

Access next node

CD KL PB TS

KQ MA MZ NK

EK GK

AQ BJ CA

PF PP QA ST

TZ WW

Look for MZ

Copyright © 2003-2006 Curt Hill

Search next node

CD KL PB TS

KQ MA MZ NK

EK GK

AQ BJ CA

PF PP QA ST

TZ WW

Look for MZ

Copyright © 2003-2006 Curt Hill

Found

CD KL PB TS

KQ MA MZ NK

EK GK

AQ BJ CA

PF PP QA ST

TZ WW

Since this is a leaf must terminate here

Copyright © 2003-2006 Curt Hill

Insertion and Deletion• The means by which new data is

inserted and old is deleted is crucial to maintaining a BTree

• These techniques were developed Bayer and their effectiveness caused this form of tree to catch on

• The tree is never reorganized – This was a disadvantage of older ISAMs– Insertion and deletion do all the work

Copyright © 2003-2006 Curt Hill

Insertion

1. Find the leaf that should contain the inserted value

2. Insert the record3. Does the node have 2N or fewer

records?4. Yes – Stop5. No - Split the node

1. Make two nodes of N records 1. First N records and last N records

2. Promote the middle item into the ancestor3. Go back to 2

Copyright © 2003-2006 Curt Hill

Example Insertion• Suppose a 2-3 tree

– N = 1– Key is a simple integer

• Consider the following insertions:– 20, 40, 30, 10, 15, 35, 7, 26, 18

Copyright © 2003-2006 Curt Hill

Example Insertions – First Five

20 40Start with root

Insert 30 20 30 40

Node is over full – splitMiddle item is promotedRest is divided into first and last nodes

30

20 40

Inserting 15 and 35 is painless

30

15 20 35 40

Copyright © 2003-2006 Curt Hill

Example Insertion – Next one

Start with root

Insert 7

Node is over full – splitMiddle item is promotedRest is divided into first and last nodes

30

30

7 15 20

35 40

35 40

15 20

15 30

20

7 35 40

Copyright © 2003-2006 Curt Hill

Example Insertion – 26 and 18

Inserting 26 - painless

Insert 18 splits nodeSplits rootCreates a new rootTree is one level deeper

15 30

20 267

35 40

20

3015

7

35 40

18 26

Copyright © 2003-2006 Curt Hill

Insertion Comments• Only the path between the root

and the node containing the inserted node may be modified

• This did not appear as significant in the example as it actually is when N is large

• Usually most of tree is undisturbed• The root of the BTree that is

currently being accessed is usually in memory

Copyright © 2003-2006 Curt Hill

Deletion1. Find the leaf that contains the

value2. Delete the record3. Does the node have N or more

records?4. Yes – Stop5. No – Merge the nodes

1. Remove an item from the ancestor2. Pull into record that is short3. This may reduce the level4. Go back to 2 to delete the ancestor

Copyright © 2003-2006 Curt Hill

Deletion Example

Deleting any one merges three into one

30

15 35

Deleting 20 and 40 is painless

30

15 20 35 40

15 30

Copyright © 2003-2006 Curt Hill

Splitting/Merging• Relatively expensive

– Wish to avoid if we can

• Before splitting / merging– Look for a neighbor to carry or borrow

an item– Do not split unless both neighbors are

full– Do not merge unless both neighbors

are at minimum

Copyright © 2003-2006 Curt Hill

Carrying Example

Inserting 18 without split

Rotate 18 into root

15 30

20 267 35 40

18 30

7 15 20 26 35 40

Copyright © 2003-2006 Curt Hill

Borrowing Example

Rotate 9 and 15

35 40

20

3015

7 9 18 22 26

20

30

35 4022 26

9

157

Deleting 18

Copyright © 2003-2006 Curt Hill

Utilization• Each node of a BTree except root is

between ½ full and full– Expected utilization is the ¾ full when

the space occupied by root is negligible• Carrying records to adjacent nodes

tends to increase this• Borrowing tends to decrease• Stable trees may also be compacted

– Look for adjacent nodes that could be merged

Copyright © 2003-2006 Curt Hill

BTree Variants: B*Tree• B*Tree

– A BTree with only keys in the interior and root nodes

– Data is all in leaves– Since the key is generally much

smaller than the data this greatly increases fan-out

– Two different kinds of pages• Leaves and root/interior nodes

Copyright © 2003-2006 Curt Hill

B*Tree

35 40

20

3015

7 15 18 20 26 30

Interior node values are duplicated in leaves.A leaf will usually hold fewer items than interior nodes.

Copyright © 2003-2006 Curt Hill

BTree Variants: B+Tree• A B*Tree

– Keys only in ancestral nodes– Data only in leaves

• Connect the leaves into a linked list• Foundation of most ISAMs• Follow the leaves for sequential

access– Not slower than normal heap file

• Search key for random access

Copyright © 2003-2006 Curt Hill

Fanout of B+Tree• Suppose that:

– Key is 10 bytes– Data is 79 bytes– Pointer is 4 bytes– Page size is 512

• N = 3 is largest node for a BTree– 3-6 items per node

• N = 62 is the largest node for a B*Tree– 62-124 items per interior/root node– N = 3 for leaf nodes

Copyright © 2003-2006 Curt Hill

B+Tree

29 32

10 20

27 32 40 552 5

0 2

16 18

22 27

3 5

7 10

12 16

17 18

35 40

42 55

62 70

4-5 tree with2 record leaves

Tree pointer

List pointer

19 20

Copyright © 2003-2006 Curt Hill

Key Lengths• Many keys are relatively long

– An integer is typically 4 bytes– Character string keys may be much

longer• Names• Product codes

– These are very sparse keys– They also take up too much space in

the tree

Copyright © 2003-2006 Curt Hill

Key Compression• The whole key is not always

needed in the tree• Abbreviate the key to shorten it• Lose the ability to determine if the

key is present without going to the leaves

• Gain greater fan-out and flatter tree

Copyright © 2003-2006 Curt Hill

Key Compression Savings

• The fan-out may be increased by reducing the size of the key

• Suppose a 19 character key and page id of 4 bytes in a 512 byte page

• This makes for N = 11– A 22-23 tree

• Reduce the key to 4 bytes – N = 32, a 64-65 tree

Copyright © 2003-2006 Curt Hill

Creating a B+Tree• Two ways:

– Insert as has been shown– Bulk load

• The normal insertion scheme works well for regular insertions

• Bulk load works best for large number of insertions to create a new B+Tree

Copyright © 2003-2006 Curt Hill

Bulk Load• Sort the data• Create the leaves in order and

build the index over them• The index does a normal split

mechanism

Copyright © 2003-2006 Curt Hill

Bulk Loading

2 3

Add 2,3,5,7

5 7

3

Add 10, 11, 15 173 7 11

2 3 5 7 15 1710 11

Copyright © 2003-2006 Curt Hill

Clustered• All the examples so far are

clustered• The lowest interior node has only

one pointer to a block– Between two keys is a page of several

entries– One key addresses many data items

• B+Trees may also handle an unclustered index

Copyright © 2003-2006 Curt Hill

Unclustered BTree Index• Essentially the same as a regular

B+Tree with several exceptions• Uses another trees leaves• No data at all• In last interior level each key

points at a separate leaf page• The leaves are in a completely

different order than the interior nodes

Copyright © 2003-2006 Curt Hill

Example Indices

Cal Kline

Abel Bart Cal

Dan Kline

More

Robb Tee

Lee Mic More

Mule Robb

Sand Tax Tee

Tone Tu Zone

303 307 352 412

601 672 720201 285 295 301 439 450 472 513 600

301 412 600

Copyright © 2003-2006 Curt Hill

Keys• Primary keys get a clustered index• The unique attribute usually forces

an unclustered index• An index may be attached to a set

of fields that are not unique

Copyright © 2003-2006 Curt Hill

Nodes in Memory• A normal BTree has a list of keys and

pointers within a node– Thus no other organizing information is

stored in the disk node

• Depending on the size the list may seached using a sequential or binary search

• Once it is in memory that sequential search may not be the best thing

• Often we wish to store the top levels of a BTree in memory for speed

Copyright © 2003-2006 Curt Hill

Red-Black Trees• A binary tree superimposed upon a

BTree structure• Each pointer must show whether

the target is in the same BTree node or a different one

• The red arrows are within the node, the black arrows span nodes– Hence the name red-black tree

Copyright © 2003-2006 Curt Hill

Consider this tree

10 20

30

35 4022 26

14

15 11 12

3 6

7 91 2 4 5

Copyright © 2003-2006 Curt Hill

Superimposed Red Black Tree

10 20

30

35 4022 26

14

15 11 12

3 6

7 91 2 4 5

Copyright © 2003-2006 Curt Hill

Red-Black Trees Again• However the tree may be searched

like any other binary search tree• Insertions and deletions are made

somewhat more complicated because of having to conform to both patterns:– Binary search tree– BTree

Copyright © 2003-2006 Curt Hill

Why are BTrees Popular?• Self organizing• Any type of key• Usually flat trees

– Small number of accesses

• Average 75% utilization• Can be created easily