Upload
tracy-webster
View
217
Download
0
Embed Size (px)
Citation preview
BTrees & Sorting
11/3
Announcements
• I hope you had a great Halloween.
• Regrade requests were due a few minutes ago…
Indexing“If you don’t find it in the index, look very carefully through the entire catalog” -- Sears, Roebuck, and Co., Consumers Guide, 1897
Index Motivation
• A file contains some records, say products
• We want faster access to those records– E.g., Give me all products made by Sony
• Intuition: Build a second file that organizes the records “by product” to make this faster– NB: we don’t always have to build a second file
Indexes• An index on a file speeds up selections on
the search key fields for the index.– Search key properties
• Any subset of fields• is not the same as key of a relation
Product(name, maker, price)On which attributes
would you build indexes?
More precisely
• An index contains a collection of data entries, and supports efficient retrieval of all data entries k* with a given key value k.
Product(name, maker, price) Sample queries?
Indexing is one the most important facilities provided by a database for performance
Operations on an Index
• Search: Given a key find all records– More sophisticated variants as well. Why?
• Insert /Remove entries– Bulk Load. Why?
Real difference between structures: costs of ops determines which index you pick and why
Data File with Several Index Files11 8012 1012 2013 75
Name Age SalBob 12 10Cal 11 80Joe 12 20Sue 13 75
11121213
10 1220 1275 1380 11
<Age, Sal>
<Sal, Age>
80102075
<Age> <Sal>
Equality Query:Age = 12 and Sal = 90?
Range Query:Age = 5 and Sal > 5?
Composite keys in Dictionary Order
High-level of Index Structures
Outline
• Btrees – Very good for range queries, sorted data– Some old databases only implemented Btrees
• Hash Tables– There are variants of this basic structure to deal
with IO
The data structures we present here are “IO aware”
B+ Trees
• Search trees – B does not mean binary!
• Idea in B Trees:– make 1 node = 1 physical page– Balanced, height adjusted tree (not the B either)
• Idea in B+ Trees:– Make leaves into a linked list (range queries)
Each node has >= d and <= 2d keys (except root)
B+ Trees Basics
30 120 240
Keys k < 30Keys 30<=k<120 Keys 120<=k<240
Keys 240<=k
40 50 60
40 50 60
Next leaf
Each leaf has >=d and <= 2d keys:
Parameter d = the degreeInternal Nodes
Leaf Nodes
B+ Tree Example
80
20 60 100 120
140
10 15
18 20 30
40 50 60 65
80 85
90
10 15 18 20 30 40 50 60 65 80 85 90
d = 2
1. No data in internal nodes.
2. Links between leaf pages.
Searching a B+ Tree
• Exact key values:– Start at the root– Proceed down, to the leaf
• Range queries:– As above– Then sequential traversal
Select nameFrom peopleWhere age = 25
Select nameFrom peopleWhere 20 <= age and age <= 30
B+ Tree Example
80
20 60 100 120
140
10 15
18 20 30
40 50 60 65
80 85
90
10 15 18 20 30 40 50 60 65 80 85 90
K = 30? 30 < 80.
30 in [20,60)
To the data!
B+ Tree Example
80
20 60 100 120
140
10 15
18 20 30
40 50 60 65
80 85
90
10 15 18 20 30 40 50 60 65 80 85 90
K in [30,85] 30 < 80.
30 in [20,60)
To the data!Use those leaf
pointers!
B+ Tree Design
• How large is d ?• Example:– Key size = 4 bytes– Pointer size = 8 bytes– Block size = 4096 byes
• 2d x 4 + (2d+1) x 8 <= 4096• d = 170
Observable Universe contains ≈ 1080 atoms. What is height of a B+tree that indexes it?
NB: Oracle allows 64k pages
TiB is 240 bytes. What is the height to index with 64k Pages?
B+ Trees in Practice
• Typical order: 100. Typical fill-factor: 67%.– average fanout = 133
• Typical capacities:– Height 4: 1334 = 312,900,700 records– Height 3: 1333 = 2,352,637 records
• Top levels of tree sit in the buffer pool:– Level 1 = 1 page = 8 Kbytes– Level 2 = 133 pages = 1 Mbyte– Level 3 = 17,689 pages = 133 MBytes
Typically, pay for one IO!
Insert!
Insertion in a B+ Tree
Insert (K, P)• Find leaf where K belongs, insert• If no overflow (2d keys or less), halt• If overflow (2d+1 keys), split node, insert in parent:
• If leaf, keep K3 too in right node• When root splits, new root has 1 key only
K1 K2 K3 K4 K5
P0
P1 P2 P3 P4 p5
K1 K2
P0 P1 P2
K4 K5
P3 P4 p5
(K3, ) to parent
Insertion in a B+ Tree
80
20 60 100 120
140
10 15
18 20 30
40 50 60 65
80 85
90
10 15 18 20 30 40 50 60 65 80 85 90
Insert K=19
Insertion in a B+ Tree
80
20 60 100 120
140
10 15
18 19 20 30
40 50 60 65
80 85
90
10 15 18 20 30 40 50 60 65 80 85 9019
After insertion
Insertion in a B+ Tree
80
20 60 100 120
140
10 15
18 19 20 30
40 50 60 65
80 85
90
10 15 18 20 30 40 50 60 65 80 85 9019
Now insert 25
Insertion in a B+ Tree
80
20 60 100 120
140
10 15
18 19 20 25
30 40
50 60 65
80 85
90
10 15 18 20 25 30 40 60 65 80 85 9019
After insertion
50
Insertion in a B+ Tree
80
20 60 100 120
140
10 15
18 19 20 25
30 40
50 60 65
80 85
90
10 15 18 20 25 30 40 60 65 80 85 9019
But now have to split !
50
Insertion in a B+ Tree
80
20 30 60 100 120
140
10 15
18 19 20 25
60 65
80 85
90
10 15 18 20 25 30 40 60 65 80 85 9019
After the split
50
30 40
50
Key concepts (exam)
• How to search in a B+tree – which pages are touched
• Understanding the impact of various design decisions.
• Will not ask for the details of the insert algorithm, but do need to know it remains balanced.
Clustered Indexes
Index Classification
An index is clustered if the data is ordered in the same way
as the underlying data.
Clustered vs. Unclustered Index
Index entries direct search
Data entries
(Index File)
(Data file)Data Records
Data Records
CLUSTERED UNCLUSTERED
Clustered (or not) dramatically impacts cost
A Simple Cost Models
Operations on an Index
Search: Given a key find all records– More sophisticated variants as well.
Real difference between structures: costs of ops which index you pick and why
Cost Model for Our Analysis
We ignore CPU costs, for simplicity:– N: The number of records– R: Number of records per page
Measure number of page I/O’s
Goal: Good enough to show the overall trends…
Clustered v. Unclustered
Fanout of Tree is F.Range query for M entries (100 per page)
IOs to search for a single item?
Traversal of the tree: logF(1.5N)Range search Query : logF(1.5N) + ceil(M/100)
Traversal of the tree: logF(1.5N)Range search Query : logF(1.5N) + M
Unclustered
Clustered
Which of these IOs are random/sequential?
Plugging in Some numbers
Clustered:logF(1.5N) + ceil(M/100) ~ 1 Random IO (10ms)
Unclustered:logF(1.5N) + M Random IO (M*10ms)
If M is 1 then there is no difference!If M is 100,000 records, ~10 minutes vs. 10ms
Takeaway
• B+Tree are a workhorse index.
• You can write down a cost model.– Databases actually do this!
• Clustered v. unclustered is a big deal.
Sorting.
Why Sort?
• Data requested in sorted order – e.g., find students in increasing GPA order
• Sorting is first step in bulk loading B+ tree index.
A classic problem in computer science!
More reasons to sort…
• Sorting useful for eliminating duplicate copies in a collection of records (Why?)
• Sort-merge join algorithm involves sorting.
• Problem: sort 1Tb of data with 1Gb of RAM.– why not use virtual memory?
Simplified External Sorts.
Two Ideas behind external sort
• I/O optimized: long sequential disk access
• Observation: Merging sorted files is easy
Sort small sets in memory, then merge.
Phase I: Buffer with 3 Pages Sort
Main Memory
44,10,33,55
Sort it! (Quicksort)
10,33,44,55
Phase 1, Per Page2 IOs (1 Read,1 Write)
18,8,5,305,8,18,30 End: All pages sorted.
Phase II: Merge
Main Memory
10,33,44,55
5,8,18,30
10,33,44,555,8,18,30
1. Read Pages
5,8,10,1855,8
2. Merge
Phase II: Merge
Main Memory
10,33,44,55
5,8,18,30
10,33,44,555,8,18,30
1. Read Pages
5,8,10,1855,8
2. Merge
Phase II: Merge
Main Memory
10,33,44,55
5,8,18,30
10,33,44,555,8,18,30
1. Read Pages
5,8,10,185,85,8,10
2. Merge
Phase II: Merge
Main Memory
10,33,44,55
5,8,18,30
10,33,44,555,8,18,30
1. Read Pages
5,8,10,185,8,105,8,10,18
2. Merge
3rd Page is filled
Phase II: Merge
Main Memory
10,33,44,55
5,8,18,30
10,33,44,555,8,18,30
1. Read Pages
5,8,10,185,8,10,18
2. Merge
3. Write Back
Keep on Merging!
3 Buffer Pages Sort
30,33,44,55
Main Memory
5,8,10,18
Now, runs of length 2.If our file has 16 pages,
what is next?
Phase II: Merge
Main Memory
10,33,44,55
5,8,18,30
10,33,44,555,8,18,30
1. Read Pages
5,8,10,185,8,10,18
2. Merge
3. Write Back
Two-Way External Merge SortEach pass we read + write each page in file.N pages in the file => the number of passes
So toal cost is:
Idea: Divide and conquer: sort subfiles and merge
log2 1N
2 12N Nlog
Input file
1-page runs
2-page runs
4-page runs
8-page runs
PASS 0
PASS 1
PASS 2
PASS 3
9
3,4 6,2 9,4 8,7 5,6 3,1 2
3,4 5,62,6 4,9 7,8 1,3 2
2,34,6
4,7
8,91,35,6 2
2,3
4,46,7
8,9
1,23,56
1,22,3
3,4
4,56,6
7,8
More Buffer Pages Sort
18,8,5,30
Main Memory
44,10,33,55
What if we have B+1 Buffer Pages?
Sort IOs:2N(1 + logB(N/(B+1)))
1st Pass: Runs of Length B+1
Merge Phase: B-way Merge.
Number of Passes of External Sort N B=3 B=5 B=9 B=17 B=129 B=257100 7 4 3 2 1 11,000 10 5 4 3 2 210,000 13 7 5 4 2 2100,000 17 9 6 5 3 31,000,000 20 10 7 5 3 310,000,000 23 12 8 6 4 3100,000,000 26 14 9 7 4 41,000,000,000 30 15 10 8 5 4
Engineer’s rule of thumb: You sort in 3 passes
Other Optimizations
Can get twice as long runs– Tournament sort (used in Postgres)
Can Improve IO performance by using bigger buffers to “prefetch” or “double buffer”– Prefetch: Hide latency– Bigger Batch Sizes: Amortize expensive sequential
reads and writes.