54
BTrees & Sorting 11/3

BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

Embed Size (px)

Citation preview

Page 1: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

BTrees & Sorting

11/3

Page 2: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

Announcements

• I hope you had a great Halloween.

• Regrade requests were due a few minutes ago…

Page 3: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

Indexing“If you don’t find it in the index, look very carefully through the entire catalog” -- Sears, Roebuck, and Co., Consumers Guide, 1897

Page 4: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

Index Motivation

• A file contains some records, say products

• We want faster access to those records– E.g., Give me all products made by Sony

• Intuition: Build a second file that organizes the records “by product” to make this faster– NB: we don’t always have to build a second file

Page 5: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

Indexes• An index on a file speeds up selections on

the search key fields for the index.– Search key properties

• Any subset of fields• is not the same as key of a relation

Product(name, maker, price)On which attributes

would you build indexes?

Page 6: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

More precisely

• An index contains a collection of data entries, and supports efficient retrieval of all data entries k* with a given key value k.

Product(name, maker, price) Sample queries?

Indexing is one the most important facilities provided by a database for performance

Page 7: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

Operations on an Index

• Search: Given a key find all records– More sophisticated variants as well. Why?

• Insert /Remove entries– Bulk Load. Why?

Real difference between structures: costs of ops determines which index you pick and why

Page 8: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

Data File with Several Index Files11 8012 1012 2013 75

Name Age SalBob 12 10Cal 11 80Joe 12 20Sue 13 75

11121213

10 1220 1275 1380 11

<Age, Sal>

<Sal, Age>

80102075

<Age> <Sal>

Equality Query:Age = 12 and Sal = 90?

Range Query:Age = 5 and Sal > 5?

Composite keys in Dictionary Order

Page 9: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

High-level of Index Structures

Page 10: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

Outline

• Btrees – Very good for range queries, sorted data– Some old databases only implemented Btrees

• Hash Tables– There are variants of this basic structure to deal

with IO

The data structures we present here are “IO aware”

Page 11: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

B+ Trees

• Search trees – B does not mean binary!

• Idea in B Trees:– make 1 node = 1 physical page– Balanced, height adjusted tree (not the B either)

• Idea in B+ Trees:– Make leaves into a linked list (range queries)

Page 12: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

Each node has >= d and <= 2d keys (except root)

B+ Trees Basics

30 120 240

Keys k < 30Keys 30<=k<120 Keys 120<=k<240

Keys 240<=k

40 50 60

40 50 60

Next leaf

Each leaf has >=d and <= 2d keys:

Parameter d = the degreeInternal Nodes

Leaf Nodes

Page 13: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

B+ Tree Example

80

20 60 100 120

140

10 15

18 20 30

40 50 60 65

80 85

90

10 15 18 20 30 40 50 60 65 80 85 90

d = 2

1. No data in internal nodes.

2. Links between leaf pages.

Page 14: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

Searching a B+ Tree

• Exact key values:– Start at the root– Proceed down, to the leaf

• Range queries:– As above– Then sequential traversal

Select nameFrom peopleWhere age = 25

Select nameFrom peopleWhere 20 <= age and age <= 30

Page 15: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

B+ Tree Example

80

20 60 100 120

140

10 15

18 20 30

40 50 60 65

80 85

90

10 15 18 20 30 40 50 60 65 80 85 90

K = 30? 30 < 80.

30 in [20,60)

To the data!

Page 16: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

B+ Tree Example

80

20 60 100 120

140

10 15

18 20 30

40 50 60 65

80 85

90

10 15 18 20 30 40 50 60 65 80 85 90

K in [30,85] 30 < 80.

30 in [20,60)

To the data!Use those leaf

pointers!

Page 17: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

B+ Tree Design

• How large is d ?• Example:– Key size = 4 bytes– Pointer size = 8 bytes– Block size = 4096 byes

• 2d x 4 + (2d+1) x 8 <= 4096• d = 170

Observable Universe contains ≈ 1080 atoms. What is height of a B+tree that indexes it?

NB: Oracle allows 64k pages

TiB is 240 bytes. What is the height to index with 64k Pages?

Page 18: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

B+ Trees in Practice

• Typical order: 100. Typical fill-factor: 67%.– average fanout = 133

• Typical capacities:– Height 4: 1334 = 312,900,700 records– Height 3: 1333 = 2,352,637 records

• Top levels of tree sit in the buffer pool:– Level 1 = 1 page = 8 Kbytes– Level 2 = 133 pages = 1 Mbyte– Level 3 = 17,689 pages = 133 MBytes

Typically, pay for one IO!

Page 19: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

Insert!

Page 20: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

Insertion in a B+ Tree

Insert (K, P)• Find leaf where K belongs, insert• If no overflow (2d keys or less), halt• If overflow (2d+1 keys), split node, insert in parent:

• If leaf, keep K3 too in right node• When root splits, new root has 1 key only

K1 K2 K3 K4 K5

P0

P1 P2 P3 P4 p5

K1 K2

P0 P1 P2

K4 K5

P3 P4 p5

(K3, ) to parent

Page 21: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

Insertion in a B+ Tree

80

20 60 100 120

140

10 15

18 20 30

40 50 60 65

80 85

90

10 15 18 20 30 40 50 60 65 80 85 90

Insert K=19

Page 22: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

Insertion in a B+ Tree

80

20 60 100 120

140

10 15

18 19 20 30

40 50 60 65

80 85

90

10 15 18 20 30 40 50 60 65 80 85 9019

After insertion

Page 23: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

Insertion in a B+ Tree

80

20 60 100 120

140

10 15

18 19 20 30

40 50 60 65

80 85

90

10 15 18 20 30 40 50 60 65 80 85 9019

Now insert 25

Page 24: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

Insertion in a B+ Tree

80

20 60 100 120

140

10 15

18 19 20 25

30 40

50 60 65

80 85

90

10 15 18 20 25 30 40 60 65 80 85 9019

After insertion

50

Page 25: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

Insertion in a B+ Tree

80

20 60 100 120

140

10 15

18 19 20 25

30 40

50 60 65

80 85

90

10 15 18 20 25 30 40 60 65 80 85 9019

But now have to split !

50

Page 26: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

Insertion in a B+ Tree

80

20 30 60 100 120

140

10 15

18 19 20 25

60 65

80 85

90

10 15 18 20 25 30 40 60 65 80 85 9019

After the split

50

30 40

50

Page 27: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

Key concepts (exam)

• How to search in a B+tree – which pages are touched

• Understanding the impact of various design decisions.

• Will not ask for the details of the insert algorithm, but do need to know it remains balanced.

Page 28: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

Clustered Indexes

Page 29: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

Index Classification

An index is clustered if the data is ordered in the same way

as the underlying data.

Page 30: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

Clustered vs. Unclustered Index

Index entries direct search

Data entries

(Index File)

(Data file)Data Records

Data Records

CLUSTERED UNCLUSTERED

Clustered (or not) dramatically impacts cost

Page 31: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

A Simple Cost Models

Page 32: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

Operations on an Index

Search: Given a key find all records– More sophisticated variants as well.

Real difference between structures: costs of ops which index you pick and why

Page 33: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

Cost Model for Our Analysis

We ignore CPU costs, for simplicity:– N: The number of records– R: Number of records per page

Measure number of page I/O’s

Goal: Good enough to show the overall trends…

Page 34: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

Clustered v. Unclustered

Fanout of Tree is F.Range query for M entries (100 per page)

IOs to search for a single item?

Traversal of the tree: logF(1.5N)Range search Query : logF(1.5N) + ceil(M/100)

Traversal of the tree: logF(1.5N)Range search Query : logF(1.5N) + M

Unclustered

Clustered

Which of these IOs are random/sequential?

Page 35: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

Plugging in Some numbers

Clustered:logF(1.5N) + ceil(M/100) ~ 1 Random IO (10ms)

Unclustered:logF(1.5N) + M Random IO (M*10ms)

If M is 1 then there is no difference!If M is 100,000 records, ~10 minutes vs. 10ms

Page 36: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

Takeaway

• B+Tree are a workhorse index.

• You can write down a cost model.– Databases actually do this!

• Clustered v. unclustered is a big deal.

Page 37: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

Sorting.

Page 38: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

Why Sort?

• Data requested in sorted order – e.g., find students in increasing GPA order

• Sorting is first step in bulk loading B+ tree index.

A classic problem in computer science!

Page 39: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

More reasons to sort…

• Sorting useful for eliminating duplicate copies in a collection of records (Why?)

• Sort-merge join algorithm involves sorting.

• Problem: sort 1Tb of data with 1Gb of RAM.– why not use virtual memory?

Page 40: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

Do people care?

Sort benchmark bares his name

http://sortbenchmark.org

Page 41: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

Simplified External Sorts.

Page 42: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

Two Ideas behind external sort

• I/O optimized: long sequential disk access

• Observation: Merging sorted files is easy

Sort small sets in memory, then merge.

Page 43: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

Phase I: Buffer with 3 Pages Sort

Main Memory

44,10,33,55

Sort it! (Quicksort)

10,33,44,55

Phase 1, Per Page2 IOs (1 Read,1 Write)

18,8,5,305,8,18,30 End: All pages sorted.

Page 44: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

Phase II: Merge

Main Memory

10,33,44,55

5,8,18,30

10,33,44,555,8,18,30

1. Read Pages

5,8,10,1855,8

2. Merge

Page 45: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

Phase II: Merge

Main Memory

10,33,44,55

5,8,18,30

10,33,44,555,8,18,30

1. Read Pages

5,8,10,1855,8

2. Merge

Page 46: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

Phase II: Merge

Main Memory

10,33,44,55

5,8,18,30

10,33,44,555,8,18,30

1. Read Pages

5,8,10,185,85,8,10

2. Merge

Page 47: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

Phase II: Merge

Main Memory

10,33,44,55

5,8,18,30

10,33,44,555,8,18,30

1. Read Pages

5,8,10,185,8,105,8,10,18

2. Merge

3rd Page is filled

Page 48: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

Phase II: Merge

Main Memory

10,33,44,55

5,8,18,30

10,33,44,555,8,18,30

1. Read Pages

5,8,10,185,8,10,18

2. Merge

3. Write Back

Keep on Merging!

Page 49: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

3 Buffer Pages Sort

30,33,44,55

Main Memory

5,8,10,18

Now, runs of length 2.If our file has 16 pages,

what is next?

Page 50: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

Phase II: Merge

Main Memory

10,33,44,55

5,8,18,30

10,33,44,555,8,18,30

1. Read Pages

5,8,10,185,8,10,18

2. Merge

3. Write Back

Page 51: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

Two-Way External Merge SortEach pass we read + write each page in file.N pages in the file => the number of passes

So toal cost is:

Idea: Divide and conquer: sort subfiles and merge

log2 1N

2 12N Nlog

Input file

1-page runs

2-page runs

4-page runs

8-page runs

PASS 0

PASS 1

PASS 2

PASS 3

9

3,4 6,2 9,4 8,7 5,6 3,1 2

3,4 5,62,6 4,9 7,8 1,3 2

2,34,6

4,7

8,91,35,6 2

2,3

4,46,7

8,9

1,23,56

1,22,3

3,4

4,56,6

7,8

Page 52: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

More Buffer Pages Sort

18,8,5,30

Main Memory

44,10,33,55

What if we have B+1 Buffer Pages?

Sort IOs:2N(1 + logB(N/(B+1)))

1st Pass: Runs of Length B+1

Merge Phase: B-way Merge.

Page 53: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

Number of Passes of External Sort N B=3 B=5 B=9 B=17 B=129 B=257100 7 4 3 2 1 11,000 10 5 4 3 2 210,000 13 7 5 4 2 2100,000 17 9 6 5 3 31,000,000 20 10 7 5 3 310,000,000 23 12 8 6 4 3100,000,000 26 14 9 7 4 41,000,000,000 30 15 10 8 5 4

Engineer’s rule of thumb: You sort in 3 passes

Page 54: BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…

Other Optimizations

Can get twice as long runs– Tournament sort (used in Postgres)

Can Improve IO performance by using bigger buffers to “prefetch” or “double buffer”– Prefetch: Hide latency– Bigger Batch Sizes: Amortize expensive sequential

reads and writes.