49
B-trees - Hashing B-trees - Hashing

B-trees - Hashing. 11.2Database System Concepts Review: B-trees and B+-trees Multilevel, disk-aware, balanced index methods primary or secondary dense

  • View
    218

  • Download
    2

Embed Size (px)

Citation preview

B-trees - HashingB-trees - Hashing

11.2Database System Concepts

Review: B-trees and B+-treesReview: B-trees and B+-trees

Multilevel, disk-aware, balanced index methods

•primary or secondary• dense or sparse•supports selection and range queries

B+-trees: most common indexing structure in databases

• all actual values stored on leaf-nodes.

Optimality: space O(N/B), updates O(log B (N/B)), queries O(log B (N/B)+K/B) (B is the fan out of a node)

11.3Database System Concepts

Root

B+Tree Example Order= 4

100

120

150

180

30

3 5 11

30

35

100

101

110

120

130

150

156

179

180

200

11.4Database System Concepts

Full node min. node

Non-leaf

Leaf

n=4

12

01

50

18

0

30

3 5 11

30

35

11.5Database System Concepts

B+tree rulesB+tree rules tree of order tree of order nn

(1) All leaves at same lowest level (balanced tree)

(2) Pointers in leaves point to records except for “sequence pointer”

(3) Number of pointers/keys for B+tree

Non-leaf(non-root) n n-1 n/2 n/2- 1

Leaf(non-root) n n-1

Root n n-1 2 1

Max Max Min Min ptrs keys ptrs keys

(n-1)/2(n-1)/2

11.6Database System Concepts

Insert into B+treeInsert into B+tree

(a) simple case

space available in leaf

(b) leaf overflow

(c) non-leaf overflow

(d) new root

11.7Database System Concepts

(a) Insert key = 32 n=43 5 11

30

31

30

100

32

11.8Database System Concepts

(a) Insert key = 7 n=4

3 5 11

30

31

30

100

3 5

7

7

11.9Database System Concepts

(c) Insert key = 160n=4

10

0

120

150

180

150

156

179

180

200

160

18

0

160

179

11.10Database System Concepts

(d) New root, insert 45 n=4

10

20

30

1 2 3 10

12

20

25

30

32

40

40

45

40

30new root

11.11Database System Concepts

(a) Simple case - no example

(b) Coalesce with neighbor (sibling)

(c) Re-distribute keys

(d) Cases (b) or (c) at non-leaf

Deletion from B+treeDeletion from B+tree

11.12Database System Concepts

(b) Coalesce with sibling Delete 50

10

40

100

10

20

30

40

50

n=5

40

11.13Database System Concepts

(c) Redistribute keys Delete 50

10

40

100

10

20

30

35

40

50

n=5

35

35

11.14Database System Concepts

40

45

30

37

25

26

20

22

10

141 3

10

20

30

40

(d) Non-leaf coalesce Delete 37 n=5

40

30

25

25

new root

11.15Database System Concepts

Selection QueriesSelection Queries

Yes! Hashing

static hashing

dynamic hashing

B+-tree is perfect, but....

to answer a selection query (ssn=10) needs to traverse a full path.

In practice, 3-4 block accesses (depending on the height of the tree, buffering)

Any better approach?

11.16Database System Concepts

HashingHashing

Hash-based indexes are best for equality selections. Cannot support range searches.

Static and dynamic hashing techniques exist; trade-offs similar to ISAM vs. B+ trees.

11.17Database System Concepts

Static HashingStatic Hashing # primary pages fixed, allocated sequentially, never de-allocated;

overflow pages if needed.

h(k) MOD N= bucket to which data entry with key k belongs. (N = # of buckets)

h(key) mod N

hkey

Primary bucket pages Overflow pages

10

N-1

11.18Database System Concepts

Static Hashing (Contd.)Static Hashing (Contd.)

Buckets contain data entries.

Hash fn works on search key field of record r. Use its value MOD N to distribute values over range 0 ... N-1. h(key) = (a * key + b) usually works well.

a and b are constants; lots known about how to tune h.

Long overflow chains can develop and degrade performance. extendable and Linear Hashing: Dynamic techniques to fix this problem.

11.19Database System Concepts

extendable Hashingextendable Hashing

Situation: Bucket (primary page) becomes full. Why not re-organize file by doubling # of buckets? Reading and writing all pages is expensive!

Idea: Use directory of pointers to buckets, double # of buckets by doubling the directory, splitting just the bucket that overflowed! Directory much smaller than file, so doubling it is much cheaper. Only one

page of data entries is split. No overflow page!

Trick lies in how hash function is adjusted!

11.20Database System Concepts

ExampleExample

13*00

0110

11

2

2

1

2

LOCAL DEPTH

GLOBAL DEPTH

DIRECTORY

Bucket A

Bucket B

Bucket C10*

1* 7*

4* 12* 32* 16*

5*

we denote r by h(r).

• Directory is array of size 4.• Bucket for record r has entry with index = `global depth’ least

significant bits of h(r);– If h(r) = 5 = binary 101, it is in bucket pointed to by 01.– If h(r) = 7 = binary 111, it is in bucket pointed to by 11.

11.21Database System Concepts

Handling InsertsHandling Inserts

Find bucket where record belongs. If there’s room, put it there. Else, if bucket is full, split it:

increment local depth of original page allocate new page with new local depth re-distribute records from original page. add entry for the new page to the directory

11.22Database System Concepts

Example: Insert 21, then 19, 15Example: Insert 21, then 19, 15

13*00

0110

11

2

2

LOCAL DEPTH

GLOBAL DEPTH

DIRECTORY

Bucket A

Bucket B

Bucket C

2Bucket D

DATA PAGES

10*

1* 7*

24* 12* 32* 16*

15*7* 19*

5*

21 = 10101

19 = 10011

15 = 01111

1221*

11.23Database System Concepts

24* 12* 32*16*

Insert Insert hh(r)=20 (Causes Doubling)(r)=20 (Causes Doubling)

00

01

10

11

2 2

2

2

LOCAL DEPTH

GLOBAL DEPTHBucket A

Bucket B

Bucket C

Bucket D

1* 5* 21*13*

10*

15* 7* 19*

(`split image'of Bucket A)

20*

3

Bucket A24* 12*

of Bucket A)

3

Bucket A2(`split image'

4* 20*12*

2

Bucket B1* 5* 21*13*

10*

2

19*

2

Bucket D15* 7*

3

32*16*

LOCAL DEPTH

000

001

010

011

100

101

110

111

3

GLOBAL DEPTH

3

32*16*

11.24Database System Concepts

Points to NotePoints to Note

20 = binary 10100. Last 2 bits (00) tell us r belongs in either A or A2. Last 3 bits needed to tell which. Global depth of directory: Max # of bits needed to tell which bucket an entry

belongs to.

Local depth of a bucket: # of bits used to determine if an entry belongs to this bucket.

When does bucket split cause directory doubling? Before insert, local depth of bucket = global depth. Insert causes local depth

to become > global depth; directory is doubled by copying it over and `fixing’ pointer to split image page.

11.25Database System Concepts

Directory DoublingDirectory Doubling

00

01

10

11

2

Why use least significant bits in directory? Allows for doubling via copying!

3

000

001

010

011

100

101

110

111

vs.

0

1

1

6*6*

6*

6 = 110

00

10

01

11

2

3

0

1

1

6*6*

6*

6 = 110000

100

010

110

001

101

011

111

Least Significant Most Significant

11.26Database System Concepts

Comments on extendable HashingComments on extendable Hashing

If directory fits in memory, equality search answered with one disk access; else two. 100MB file, 100 bytes/rec, 4K pages contains 1,000,000 records (as data

entries) and 25,000 directory elements; chances are high that directory will fit in memory.

Directory grows in spurts, and, if the distribution of hash values is skewed, directory can grow large.

Multiple entries with same hash value cause problems!

Delete: If removal of data entry makes bucket empty, can be merged with `split image’. If each directory element points to same bucket as its split image, can halve directory.

11.27Database System Concepts

Extendable Hashing vs. Other SchemesExtendable Hashing vs. Other Schemes

Benefits of extendable hashing: Hash performance does not degrade with growth of file Minimal space overhead

Disadvantages of extendable hashing Extra level of indirection to find desired record Bucket address table may itself become very big (larger than

memory) Cannot allocate very large contiguous areas on disk either Solution: B+-tree structure to locate desired record in bucket address

table

Changing size of bucket address table is an expensive operation

Linear hashing is an alternative mechanism Allows incremental growth of its directory (equivalent to bucket

address table) At the cost of more bucket overflows

11.28Database System Concepts

Comparison of Ordered Indexing and HashingComparison of Ordered Indexing and Hashing

Cost of periodic re-organization Relative frequency of insertions and deletions Is it desirable to optimize average access time at the expense of

worst-case access time? Expected type of queries:

Hashing is generally better at retrieving records having a specified value of the key.

If range queries are common, ordered indices are to be preferred

In practice: PostgreSQL supports hash indices, but discourages use due to poor

performance Oracle supports B+trees, static hash organization, but not hash

indices SQLServer supports only B+-trees

11.29Database System Concepts

Bitmap IndicesBitmap Indices

Bitmap indices are a special type of index designed for efficient querying on multiple keys

Very effective on attributes that take on a relatively small number of distinct values E.g. gender, country, state, …

E.g. income-level (income broken up into a small number of levels such as 0-9999, 10000-19999, 20000-50000, 50000- infinity)

A bitmap is simply an array of bits For each gender, we associate a bitmap, where each bit represents

whether or not the corresponding record has that gender.

11.30Database System Concepts

Bitmap Indices (Cont.)Bitmap Indices (Cont.)

In its simplest form a bitmap index on an attribute has a bitmap for each value of the attribute Bitmap has as many bits as records

In a bitmap for value v, the bit for a record is 1 if the record has the value v for the attribute, and is 0 otherwise

11.31Database System Concepts

Bitmap Indices (Cont.)Bitmap Indices (Cont.)

Bitmap indices are useful for queries on multiple attributes not particularly useful for single attribute queries

Queries are answered using bitmap operations Intersection (and) Union (or) Complementation (not)

Each operation takes two bitmaps of the same size and applies the operation on corresponding bits to get the result bitmap E.g. 100110 AND 110011 = 100010

100110 OR 110011 = 110111 NOT 100110 = 011001

Males with income level L1: And’ing of Males bitmap with Income Level L1 bitmap

10010 AND 10100 = 10000 Can then retrieve required tuples. Counting number of matching tuples is even faster

11.32Database System Concepts

Bitmap Indices (Cont.)Bitmap Indices (Cont.)

Bitmap indices generally very small compared with relation size E.g. if record is 100 bytes, space for a single bitmap is 1/800 of space

used by relation. If number of distinct attribute values is 8, bitmap is only 1% of relation size

Deletion needs to be handled properly Existence bitmap to note if there is a valid record at a record location

Needed for complementation not(A=v): (NOT bitmap-A-v) AND ExistenceBitmap

Should keep bitmaps for all values, even null value To correctly handle SQL null semantics for NOT(A=v):

intersect above result with (NOT bitmap-A-Null)

11.33Database System Concepts

Efficient Implementation of Bitmap OperationsEfficient Implementation of Bitmap Operations

Bitmaps are packed into words; a single word and (a basic CPU instruction) computes and of 32 or 64 bits at once E.g. 1-million-bit maps can be and-ed with just 31,250 instruction

Counting number of 1s can be done fast by a trick: Use each byte to index into a precomputed array of 256 elements

each storing the count of 1s in the binary representation Can use pairs of bytes to speed up further at a higher memory cost

Add up the retrieved counts

Bitmaps can be used instead of Tuple-ID lists at leaf levels of B+-trees, for values that have a large number of matching records Worthwhile if > 1/64 of the records have that value, assuming a

tuple-id is 64 bits

Above technique merges benefits of bitmap and B+-tree indices

11.34Database System Concepts

Index Definition in SQLIndex Definition in SQL

Create a B-tree index (default in most databases)create index <index-name> on <relation-name>

(<attribute-list>)

-- create index b-index on branch(branch_name)

-- create index ba-index on branch(branch_name, account) -- concatenated index

-- create index fa-index on branch(func(balance, amount)) – function index

Use create unique index to indirectly specify and enforce the condition that the search key is a candidate key.

Hash indexes: not supported by every database (but implicitly in joins,…) PostgresSQL has it but discourages due to performance

Create a bitmap indexcreate bitmap index <index-name> on <relation-name>

(<attribute-list>)

- For attributes with few distinct values

- Mainly for decision-support(query) and not OLTP (do not support updates efficiently)

To drop any index drop index <index-name>

End of ChapterEnd of Chapter

11.36Database System Concepts

Partitioned HashingPartitioned Hashing

Hash values are split into segments that depend on each attribute of the search-key.

(A1, A2, . . . , An) for n attribute search-key

Example: n = 2, for customer, search-key being (customer-street, customer-city)

search-key value hash value(Main, Harrison) 101 111(Main, Brooklyn) 101 001(Park, Palo Alto) 010 010(Spring, Brooklyn) 001 001(Alma, Palo Alto) 110 010

To answer equality query on single attribute, need to look up multiple buckets. Similar in effect to grid files.

11.37Database System Concepts

Grid FilesGrid Files

Structure used to speed the processing of general multiple search-key queries involving one or more comparison operators.

The grid file has a single grid array and one linear scale for each search-key attribute. The grid array has number of dimensions equal to number of search-key attributes.

Multiple cells of grid array can point to same bucket

To find the bucket for a search-key value, locate the row and column of its cell using the linear scales and follow pointer

11.38Database System Concepts

Example Grid File for Example Grid File for accountaccount

11.39Database System Concepts

Queries on a Grid FileQueries on a Grid File

A grid file on two attributes A and B can handle queries of all following forms with reasonable efficiency

(a1 A a2)

(b1 B b2)

(a1 A a2 b1 B b2),.

E.g., to answer (a1 A a2 b1 B b2), use linear scales to find corresponding candidate grid array cells, and look up all the buckets pointed to from those cells.

11.40Database System Concepts

Grid Files (Cont.)Grid Files (Cont.)

During insertion, if a bucket becomes full, new bucket can be created if more than one cell points to it. Idea similar to extendable hashing, but on multiple dimensions

If only one cell points to it, either an overflow bucket must be created or the grid size must be increased

Linear scales must be chosen to uniformly distribute records across cells. Otherwise there will be too many overflow buckets.

Periodic re-organization to increase grid size will help. But reorganization can be very expensive.

Space overhead of grid array can be high.

R-trees (Chapter 23) are an alternative

11.41Database System Concepts

Linear HashingLinear Hashing

A dynamic hashing scheme that handles the problem of long overflow chains without using a directory.

Directory avoided in LH by using temporary overflow pages, and choosing the bucket to split in a round-robin fashion.

When any bucket overflows split the bucket that is currently pointed to by the “Next” pointer and then increment that pointer to the next bucket.

11.42Database System Concepts

Linear Hashing – The Main IdeaLinear Hashing – The Main Idea

Use a family of hash functions h0, h1, h2, ...

hi(key) = h(key) mod(2iN)

N = initial # buckets

h is some hash function

hi+1 doubles the range of hi (similar to directory doubling)

11.43Database System Concepts

Linear Hashing (Contd.)Linear Hashing (Contd.)

Algorithm proceeds in `rounds’. Current round number is “Level”.

There are NLevel (= N * 2Level) buckets at the beginning of a round

Buckets 0 to Next-1 have been split; Next to NLevel have not been split yet this round.

Round ends when all initial buckets have been split (i.e. Next = NLevel).

To start next round:

Level++;

Next = 0;

11.44Database System Concepts

LH Search AlgorithmLH Search Algorithm

To find bucket for data entry r, find hLevel(r):

If hLevel(r) >= Next (i.e., hLevel(r) is a bucket that hasn’t been involved in a split this round) then r belongs in that bucket for sure.

Else, r could belong to bucket hLevel(r) or bucket hLevel(r) + NLevel must apply hLevel+1(r) to find out.

11.45Database System Concepts

Example: Search 44 (11100), 9 (01001) Example: Search 44 (11100), 9 (01001)

0hh

1

Level=0, Next=0, N=4

00

01

10

11

000

001

010

011

PRIMARYPAGES

44* 36*32*

25*9* 5*

14* 18*10* 30*

31* 35* 11*7*

(This infois for illustrationonly!)

11.46Database System Concepts

Level=0, Next = 1, N=4

(This infois for illustrationonly!)

0hh

1

00

01

10

11

000

001

010

011

PRIMARYPAGES

OVERFLOWPAGES

00100 44*36*

32*

25*9* 5*

14*18*10*30*

31*35* 11*7* 43*

Example: Search 44 (11100), 9 (01001) Example: Search 44 (11100), 9 (01001)

11.47Database System Concepts

Linear Hashing - InsertLinear Hashing - Insert

Find appropriate bucket

If bucket to insert into is full:

Add overflow page and insert data entry.

Split Next bucket and increment Next. Note: This is likely NOT the bucket being inserted to!!!

to split a bucket, create a new bucket and use hLevel+1 to re-distribute entries.

Since buckets are split round-robin, long overflow chains don’t develop!

11.48Database System Concepts

Example: Insert 43 (101011)Example: Insert 43 (101011)

0hh

1

(This infois for illustrationonly!)

Level=0, N=4

00

01

10

11

000

001

010

011

Next=0

PRIMARYPAGES

0hh

1

Level=0

00

01

10

11

000

001

010

011

Next=1

PRIMARYPAGES

OVERFLOWPAGES

00100

44* 36*32*

25*9* 5*

14* 18*10* 30*

31* 35* 11*7*

44*36*

32*

25*9* 5*

14*18*10*30*

31*35* 11*7* 43*

(This infois for illustrationonly!)

11.49Database System Concepts

Example: End of a RoundExample: End of a Round

0hh1

22*

00

01

10

11

000

001

010

011

00100

Next=3

01

10

101

110

Level=0, Next = 3PRIMARYPAGES

OVERFLOWPAGES

32*

9*

5*

14*

25*

66* 10*18* 34*

35*31* 7* 11* 43*

44*36*

37*29*

30*

0hh1

37*

00

01

10

11

000

001

010

011

00100

10

101

110

Next=0

111

11

PRIMARYPAGES

OVERFLOWPAGES

11

32*

9* 25*

66* 18* 10* 34*

35* 11*

44* 36*

5* 29*

43*

14* 30* 22*

31*7*

50*

Insert 50 (110010)Level=1, Next = 0