80
CSC 261/461 Database Systems Lecture 16 Spring 2017 MW 3:25 pm – 4:40 pm January 18 – May 3 Dewey 1101

CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

CSC 261/461 – Database SystemsLecture 16

Spring 2017MW 3:25 pm – 4:40 pm

January 18 – May 3Dewey 1101

Page 2: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

Announcement

• Project 1 Milepost 3 is out

CSC261,Spring2017,UR

Page 3: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

File Types

• Unordered Records (Heap Files)

• Ordered Records (Sorted Files)

• Hash Files

CSC261,Spring2017,UR

Page 4: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

Average Access Times for a File of b Blocks under Basic File Organizations

Page 5: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

1. EXTERNAL MERGE SORT

5

Page 6: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

What you will learn about in this section

1. Externalmergesort

2. Externalmergesortonlargerfiles

3. Optimizationsforsorting

6

Page 7: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

Recap: External Merge Algorithm

• Suppose we want to merge two sorted files both much larger than main memory (i.e. the buffer)

• We can use the external merge algorithm to merge files of arbitrary length in 2*(N+M) IO operations with only 3 buffer pages!

Page 8: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

External Merge Sort

Page 9: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

Why are Sort Algorithms Important?

• Data requested from DB in sorted order is extremely common– e.g., find students in increasing GPA order

• Why not just use quicksort in main memory??–What about if we need to sort 1TB of data with 1GB of

RAM…

Aclassicproblemincomputerscience!

Page 10: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

So how do we sort big files?

1. Split into chunks small enough to sort in memory (“runs”)

2. Merge pairs (or groups) of runs using the external merge algorithm

3. Keep merging the resulting runs (each time = a “pass”) until left with one sorted file!

Page 11: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

External Merge Sort Algorithm

27,24 3,1

Example:• 3Buffer

pages• 6-pagefile

Disk MainMemory

Buffer

18,22

F1

F2

33,12 55,3144,10

1. Split into chunks small enough to sort in memory

Orangefile=unsorted

Page 12: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

External Merge Sort Algorithm

27,24 3,1

Disk MainMemory

Buffer

18,22

F1

F2

33,12 55,3144,10

1. Split into chunks small enough to sort in memory

Example:• 3Buffer

pages• 6-pagefile

Orangefile=unsorted

Page 13: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

External Merge Sort Algorithm

27,24 3,1

Disk

MainMemory

Buffer

18,22

F1

F233,12 55,3144,10

1. Split into chunks small enough to sort in memory

Example:• 3Buffer

pages• 6-pagefile

Orangefile=unsorted

Page 14: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

External Merge Sort Algorithm

27,24 3,1

Disk

MainMemory

Buffer

18,22

F1

F231,33 44,5510,12

Example:• 3Buffer

pages• 6-pagefile

1. Split into chunks small enough to sort in memory

Orangefile=unsorted

Page 15: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

External Merge Sort Algorithm

Disk MainMemory

BufferF1

F2

31,33 44,5510,12

AndsimilarlyforF2

27,24 3,118,2218,22 24,271,3

1. Splitintochunkssmallenoughtosortinmemory

Example:• 3Buffer

pages• 6-pagefileEachsortedfileisacalledarun

Page 16: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

External Merge Sort Algorithm

Disk MainMemory

BufferF1

F2

2. Now just run the external merge algorithm & we’re done!

31,33 44,5510,12

18,22 24,271,3

Example:• 3Buffer

pages• 6-pagefile

Page 17: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

Calculating IO Cost

For 3 buffer pages, 6 page file:

1. Split into two 3-page files and sort in memory 1. = 1 R + 1 W for each file = 2*(3 + 3) = 12 IO operations

2. Merge each pair of sorted chunks using the external merge algorithm 1. = 2*(3 + 3) = 12 IO operations

3. Total cost = 24 IO

Page 18: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

Running External Merge Sort on Larger Files

Disk

31,33 44,5510,12

18,43 24,2745,38

Assumewestillonlyhave3 bufferpages(Buffernotpictured)

31,33 47,5510,12

18,22 23,2041,3

31,33 39,5542,46

18,23 24,271,3

48,33 44,4010,12

18,22 24,2716,31

Page 19: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

Running External Merge Sort on Larger Files

Disk

31,33 44,5510,12

18,43 24,2745,38

31,33 47,5510,12

18,22 23,2041,3

31,33 39,5542,46

18,23 24,271,3

48,33 44,4010,12

18,22 24,2716,31

1.Splitintofilessmallenoughtosortinbuffer…

Assumewestillonlyhave3 bufferpages(Buffernotpictured)

Page 20: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

Running External Merge Sort on Larger Files

Disk

31,33 44,5510,12

27,38 43,4518,24

31,33 47,5510,12

20,22 23,413,18

39,42 46,5531,33

18,23 24,271,3

33,40 44,4810,12

22,24 27,3116,18

1.Splitintofilessmallenoughtosortinbuffer…andsort

Assumewestillonlyhave3 bufferpages(Buffernotpictured)

Calleachofthesesortedfilesarun

Page 21: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

Running External Merge Sort on Larger Files

Disk

31,33 44,5510,12

27,38 43,4518,24

31,33 47,5510,12

20,22 23,413,18

39,42 46,5531,33

18,23 24,271,3

33,40 44,4810,12

22,24 27,3116,18

2.Nowmergepairsof(sorted)files…theresultingfileswillbesorted!

Disk

18,24 27,3110,12

43,44 45,5533,38

12,18 20,223,10

33,41 47,5523,31

18,23 24,271,3

39,42 46,5531,33

16,18 22,2410,12

33,40 44,4827,31

Assumewestillonlyhave3 bufferpages(Buffernotpictured)

Page 22: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

Running External Merge Sort on Larger Files

Disk

31,33 44,5510,12

27,38 43,4518,24

31,33 47,5510,12

20,22 23,413,18

39,42 46,5531,33

18,23 24,271,3

33,40 44,4810,12

22,24 27,3116,18

3.Andrepeat…

Disk

18,24 27,3110,12

43,44 45,5533,38

12,18 20,223,10

33,41 47,5523,31

18,23 24,271,3

39,42 46,5531,33

16,18 22,2410,12

33,40 44,4827,31

Disk

10,12 12,183,10

22,23 24,2718,20

33,33 38,4131,31

45,47 55,5543,44

10,12 16,181,3

23,24 24,2718,22

31,33 33,3927,31

44,46 48,5540,42

Assumewestillonlyhave3 bufferpages(Buffernotpictured)

Calleachofthesestepsapass

Page 23: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

Running External Merge Sort on Larger Files

Disk

31,33 44,5510,12

27,38 43,4518,24

31,33 47,5510,12

20,22 23,413,18

39,42 46,5531,33

18,23 24,271,3

33,40 44,4810,12

22,24 27,3116,18

4.Andrepeat!

Disk

18,24 27,3110,12

43,44 45,5533,38

12,18 20,223,10

33,41 47,5523,31

18,23 24,271,3

39,42 46,5531,33

16,18 22,2410,12

33,40 44,4827,31

Disk

10,12 12,183,10

22,23 24,2718,20

33,33 38,4131,31

45,47 55,5543,44

10,12 16,181,3

23,24 24,2718,22

31,33 33,3927,31

44,46 48,5540,42

Disk

3,10 10,101,3

12,16 18,1812,12

20,22 22,2318,18

24,24 27,2723,24

31,31 31,3327,31

33,38 39,4033,33

43,44 44,4541,42

48,55 55,5546,47

Page 24: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

Simplified 3-page Buffer Version

Assume for simplicity that we split an N-page file into N single-page runs and sort these; then:

• First pass: Merge N/2 pairs of runs each of length 1 page

• Second pass: Merge N/4 pairs of runs each of length 2 pages

• In general, for N pages, we do 𝒍𝒐𝒈𝟐 𝑵 passes– +1 for the initial split & sort

• Each pass involves reading in & writing out all the pages = 2N IO

Unsortedinputfile

Split&sort

Merge

Merge

Sorted!

à 2N*( 𝒍𝒐𝒈𝟐 𝑵 +1)totalIOcost!

Page 25: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

Using B+1 buffer pages to reduce # of passes

Suppose we have B+1 buffer pages now; we can:

1. Increase length of initial runs. Sort B+1 at a time!At the beginning, we can split the N pages into runs of length B+1 and sort these in memory

2𝑁( log, 𝑁 + 1)

IOCost:

Startingwithrunsoflength1

2𝑁( log,𝑵

𝑩 + 𝟏 + 1)

StartingwithrunsoflengthB+1

Page 26: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

Using B+1 buffer pages to reduce # of passes

Suppose we have B+1 buffer pages now; we can:

2. Perform a B-way merge. On each pass, we can merge groups of B runs at a time (vs. merging pairs of runs)!

IOCost:

2𝑁( log, 𝑁 + 1) 2𝑁( log,𝑵

𝑩 + 𝟏 + 1)

Startingwithrunsoflength1

StartingwithrunsoflengthB+1

2𝑁( log2𝑵

𝑩 + 𝟏 + 1)

PerformingB-waymerges

Page 27: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

INDEXING

27

Page 28: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

What you will learn about in this section

1. Indexes:Motivation

2. Indexes:Basics

28

Page 29: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

Index Motivation

• Suppose we want to search for people of a specific age

• First idea: Sort the records by age… we know how to do this fast!

• How many IO operations to search over N sorted records?– Simple scan: O(N)– Binary search: O(𝐥𝐨𝐠𝟐 𝑵)

Person(name, age)

Couldwegetevencheapersearch?E.g.gofrom𝐥𝐨𝐠𝟐 𝑵à 𝐥𝐨𝐠𝟐𝟎𝟎 𝑵?

Page 30: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

Index Motivation

• What about if we want to insert a new person, but keep the list sorted?

• We would have to potentially shift N records, requiring up to ~ 2*N/P IO operations (where P = # of records per page)!– We could leave some “slack” in the pages…

4,5 6,71,3 3,4 5,61,2

2

7,

Couldwegetfasterinsertions?

Page 31: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

Index Motivation

• What about if we want to be able to search quickly along multiple attributes (e.g. not just age)?–We could keep multiple copies of the records, each sorted

by one attribute set… this would take a lot of space

Canwegetfastsearchovermultipleattribute(sets)withouttakingtoomuchspace?

We’llcreateseparatedatastructurescalledindexes toaddressallthesepoints

Page 32: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

Further Motivation for Indexes: NoSQL!

• NoSQL engines are (basically) just indexes!

– A lot more is left to the user in NoSQL… one of the primary remaining functions of the DBMS is still to provide index over the data records, for the reasons we just saw!

– Sometimes use B+ Trees, sometimes hash indexes

IndexesarecriticalacrossallDBMStypes

Page 33: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

Indexes: High-level

• An index on a file speeds up selections on the search key fields for the index.– Search key properties

• Any subset of fields• is not the same as key of a relation

• Example: Onwhichattributeswouldyoubuild

indexes?Product(name, maker, price)

Page 34: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

More precisely

• An index is a data structure mapping search keys to sets of rows in a database table

– Provides efficient lookup & retrieval by search key value-usually much faster than searching through all the rows of the database table

• An index can store:– Full rows it points to (primary index) or – Pointers to those rows (secondary index)

Page 35: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

Operations on an Index

• Search: Quickly find all records which meet some conditionon the search key attributes–More sophisticated variants as well. Why?

Indexingisonethemostimportantfeaturesprovidedbyadatabaseforperformance

Page 36: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

Conceptual Example

Whatifwewanttoreturnallbookspublishedafter1867?Theabovetablemightbeveryexpensivetosearchoverrow-by-row…

SELECT *FROM Russian_NovelsWHERE Published > 1867

BID Title Author Published Full_text

001 WarandPeace Tolstoy 1869 …

002 CrimeandPunishment

Dostoyevsky 1866 …

003 AnnaKarenina Tolstoy 1877 …

Russian_Novels

Page 37: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

Conceptual Example

BID Title Author Published Full_text

001 WarandPeace Tolstoy 1869 …

002 CrimeandPunishment

Dostoyevsky 1866 …

003 AnnaKarenina Tolstoy 1877 …

Published BID

1866 002

1869 001

1877 003

Maintainanindexforthis,andsearchoverthat!

Russian_NovelsBy_Yr_Index

Whymightjustkeepingthetablesortedbyyearnotbegoodenough?

Page 38: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

Conceptual Example

BID Title Author Published Full_text

001 WarandPeace Tolstoy 1869 …

002 CrimeandPunishment

Dostoyevsky 1866 …

003 AnnaKarenina Tolstoy 1877 …

Published BID

1866 002

1869 001

1877 003

Indexesshownhereastables,butinrealitywewillusemoreefficientdatastructures…

Russian_NovelsBy_Yr_Index

Author Title BID

Dostoyevsky Crime andPunishment

002

Tolstoy AnnaKarenina 003

Tolstoy War andPeace 001

By_Author_Title_Index Canhavemultipleindexestosupportmultiplesearchkeys

Page 39: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

Covering Indexes

Published BID

1866 002

1869 001

1877 003

By_Yr_Index

Wesaythatanindexiscovering foraspecificquery iftheindexcontainsalltheneededattributes-meaningthequerycanbeansweredusingtheindexalone!

The“needed”attributesaretheunionofthoseintheSELECTandWHEREclauses…

SELECT Published, BIDFROM Russian_NovelsWHERE Published > 1867

Example:

Page 40: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

Primary Indexes: Index for Sorted (Ordered) Files

CSC261,Spring2017,UR

Page 41: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

Clustering Indexes (Index for Sorted (on non-key) Files)

CSC261,Spring2017,UR

Page 42: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

Secondary Indexes (on a key field)

• Secondary means of accessing a data file

• File records could be ordered, unordered, or hashed

CSC261,Spring2017,UR

Page 43: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

Secondary Indexes (on a non-key field)Extra level of indirection

• Provides logical ordering– Though records are not

physically ordered

CSC261,Spring2017,UR

Page 44: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

High-level Categories of Index Types

• Multilevel Indexes– Very good for range queries, sorted data– Some old databases only implemented B-Trees– We will mostly look at a variant called B+ Trees

• Hash Tables – Very good for searching

Realdifferencebetweenstructures:costsofopsdetermineswhichindexyoupickandwhy

Page 45: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

MULTILEVEL INDEXES

45

Page 46: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

What you will learn about in this section

1. ISAM

2. B+Tree

46

Page 47: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

1. ISAM

47

Page 48: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

ISAM

• Indexed Sequential Access Method

– For an index with𝑏9blocks• Earlier: log,b;block

access• Now: log<=b; block

access• (𝑓𝑜 = 𝑓𝑎𝑛𝑜𝑢𝑡)

Page 49: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

1. B+ TREES

49

Page 50: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

What you will learn about in this section

1. B+Trees:Basics

2. B+Trees:Design&Cost

3. ClusteredIndexes

50

Page 51: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

B+ Trees

• Search trees – B does not mean binary!

• Idea in B Trees:–make 1 node = 1 physical page– Balanced, height adjusted tree (not the B either)

• Idea in B+ Trees:–Make leaves into a linked list (for range queries)

Page 52: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

B+ Tree Basics

10 20 30

Eachnon-leaf(“interior”)node has≥ dand≤2dkeys*

*exceptforrootnode,whichcanhavebetween1and2dkeys

Parameterd=degreeTheminimumnumberofkeyaninteriornodecanhave

Page 53: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

B+ Tree Basics

10 20 30

k<10

10≤ 𝑘<20

20≤ 𝑘<3030≤ 𝑘

Thenkeysinanodedefinen+1ranges

Page 54: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

B+ Tree Basics

10 20 30

Non-leaforinternalnode

22 25 28

Foreachrange,inanon-leafnode,thereisapointer toanothernodewithkeysinthatrange

Page 55: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

B+ Tree Basics

10 20 30

Leafnodesalsohavebetweendand2dkeys,andaredifferentinthat:

22 25 28 29

Leaf nodes

32 34 37 38

Non-leaforinternalnode

12 17

Page 56: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

B+ Tree Basics

10 20 30

22 25 28 29

Leaf nodes

32 34 37 38

Non-leaforinternalnode

12 17

Leafnodesalsohavebetweendand2dkeys,andaredifferentinthat:

Theirkeyslotscontainpointerstodatarecords

21 22 27 28 30 33 35 3715

11

Page 57: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

B+ Tree Basics

10 20 30

22 25 28 29

Leaf nodes

32 34 37 38

Non-leaforinternalnode

12 17

21 22 27 28 30 33 35 3715

11

Leafnodesalsohavebetweendand2dkeys,andaredifferentinthat:

Theirkeyslotscontainpointerstodatarecords

Theycontainapointertothenextleafnodeaswell,forfastersequentialtraversal

Page 58: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

B+ Tree Basics

10 20 30

22 25 28 29

Leaf nodes

32 34 37 38

Non-leaforinternalnode

12 17

Notethatthepointersattheleaflevelwillbetotheactualdatarecords(rows).

Wemighttruncatetheseforsimplerdisplay(asbefore)…

Name:JohnAge:21

Name:JakeAge:15

Name:BobAge:27

Name:SallyAge:28

Name:SueAge:33

Name:JessAge:35

Name:AlfAge:37Name:Joe

Age:11

Name:BessAge:22

Name:SalAge:30

Page 59: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

Some finer points of B+ Trees

Page 60: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

Searching a B+ Tree

• For exact key values:–Start at the root–Proceed down, to the leaf

• For range queries:–As above–Then sequential traversal

SELECT nameFROM peopleWHERE age = 25

SELECT nameFROM peopleWHERE 20 <= ageAND age <= 30

Page 61: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

B+ Tree Exact Search Animation

80

20 60 100 120 140

10 15 18 20 30 40 50 60 65 80 85 90

10 12 15 20 28 30 40 60 63 80 84 89

K=30?

30<80

30in[20,60)

Tothedata!

Notallnodespictured

30in[30,40)

Page 62: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

B+ Tree Range Search Animation

80

20 60 100 120 140

10 15 18 20 30 40 50 60 65 80 85 90

10 12 15 20 28 30 40 59 63 80 84 89

Kin[30,85]?

30<80

30in[20,60)

Tothedata!

Notallnodespictured

30in[30,40)

Page 63: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

B+ Tree Design

• How large is d?

• Example:– Key size = 4 bytes– Pointer size = 8 bytes– Block size = 4096 bytes

• We want each node to fit on a single block/page– 2d x 4 + (2d+1) x 8 <= 4096 à d <= 170

Page 64: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

B+ Tree: High Fanout = Smaller & Lower IO

• As compared to e.g. binary search trees, B+ Trees have high fanout(between d+1 and 2d+1)

• This means that the depth of the tree is small à getting to any element requires very few IO operations!– Also can often store most or all of the

B+ Tree in main memory!

Thefanout isdefinedasthenumberofpointerstochildnodescomingoutofanode

Notethatfanout isdynamic-we’lloftenassumeit’sconstantjusttocomeupwithapproximateeqns!

Page 65: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

Simple Cost Model for Search• Let:

– f = fanout, which is in [d+1, 2d+1] (we’ll assume it’s constant for our cost model…)– N = the total number of pages we need to index– F = fill-factor (usually ~= 2/3)

• Our B+ Tree needs to have room to index N / F pages!– We have the fill factor in order to leave some open slots for faster insertions

• What height (h) does our B+ Tree need to be?– h=1 à Just the root node- room to index f pages– h=2 à f leaf nodes- room to index f2 pages– h=3 à f2 leaf nodes- room to index f3 pages– …– h à fh-1 leaf nodes- room to index fh pages!

àWeneedaB+Treeofheighth= logH

IJ

Page 66: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

Fast Insertions & Self-Balancing

– Same cost as exact search– Self-balancing: B+ Tree remains balanced (with respect to

height) even after insert

B+Treesalso(relatively)fastforsingleinsertions!However,canbecomebottleneckifmanyinsertions(iffill-

factorslackisusedup…)

Page 67: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

Insertion

• Perform a search to determine what bucket the new record should go into.

• If the bucket is not full (at most (b-1) entries after the insertion), add the record.

• Otherwise, split the bucket.– Allocate new leaf and move half the bucket's elements to the new bucket.– Insert the new leaf's smallest key and address into the parent.– If the parent is full, split it too.

• Add the middle key to the parent node.– Repeat until a parent is found that need not split.

• If the root splits, create a new root which has one key and two pointers. (That is, the value that gets pushed to the new root gets removed from the original node)

• Note: B-trees grow at the root and not at the leave

CSC261,Spring2017,UR

Page 68: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

Insertion (Insert 85)

CSC261,Spring2017,UR

b=f=branchingfactor/fan-out=3

20 40 80 90

40 80

Page 69: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

Insertion (Insert 85)

CSC261,Spring2017,UR

b=f=branchingfactor/fan-out=3

20 40 80 90

40 80

85

Thisiswhatwewouldlike.Butthemaximumnumberofkeysinanynodeis(3-1)=2So,split.

Page 70: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

Insertion (Insert 85)

CSC261,Spring2017,UR

b=f=branchingfactor/fan-out=3

20 40 80

40 80

85

Allocatenewleafandmovehalfthebucket'selementstothenewbucket.

90

Page 71: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

Insertion (Insert 85)

CSC261,Spring2017,UR

b=f=branchingfactor/fan-out=3

20 40 80

40 80

85

Insertthenewleaf'ssmallestkeyandaddressintotheparent.

90

85

Page 72: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

Insertion (Insert 85)

CSC261,Spring2017,UR

b=f=branchingfactor/fan-out=3

20 40 80

40 80

85

Thisisnotallowed,astheparentisfull.Needtosplit

90

85

Page 73: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

Insertion (Insert 85)

CSC261,Spring2017,UR

b=f=branchingfactor/fan-out=3

20 40 80

40

80

85

Iftheparentisfull,splitittoo.

Addthemiddlekeytotheparentnode.

Repeatuntilaparentisfoundthatneednotsplit

90

85

Page 74: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

Insertion (Insert 85)

CSC261,Spring2017,UR

b=f=branchingfactor/fan-out=3

20 40 80

40

80

85

Iftherootsplits,createanewrootwhichhasonekeyandtwopointers.

(Thatis,thevaluethatgetspushedtothenewrootgetsremovedfromtheoriginalnode)

90

85

Page 75: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

Deletion

• Start at root, find leaf L where entry belongs.• Remove the entry.– If L is at least half-full, done!– If L has fewer entries than it should,

• If sibling (adjacent node with same parent as L) is more than half-full, re-distribute, borrowing an entry from it.

• Otherwise, sibling is exactly half-full, so we can merge L and sibling.• If merge occurred, must delete entry (pointing to L or sibling)

from parent of L.• Merge could propagate to root, decreasing height.

• https://www.cs.usfca.edu/~galles/visualization/BPlusTree.html

• The degree in this visualization is actually fan-out f or branching factor b.

CSC261,Spring2017,UR

Page 76: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

• Find 86

CSC261,Spring2017,UR

Page 77: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

• Delete it

CSC261,Spring2017,UR

Page 78: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

• Stealing from right sibling (redistribute).• Modify the parent node

CSC261,Spring2017,UR

Page 79: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

• Finally,

CSC261,Spring2017,UR

Page 80: CSC 261/461 –Database Systems Lecture 16 · 2017-03-23 · File Types •Unordered Records (Heap Files) •Ordered Records (Sorted Files) ... What you will learn about in this section

Acknowledgement

• Some of the slides in this presentation are taken from the slides provided by the authors.

• Many of these slides are taken from cs145 course offered byStanford University.

CSC261,Spring2017,UR