58
base Management Systems 3ed, R. Ramakrishnan and J. Gehrke CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Embed Size (px)

Citation preview

Page 1: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

CSG131 Summary

Donghui Zhang

Page 2: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2

What we learned Storage (Chp 8 & 9) B+-tree (Chp 10) Hash index (Chp 11) External sort (Chp 13) Query evaluation Concurrency control (Chp 17) Recovery (Chp 18) Project: simple DBMS implentation,

NEUStore SB-tree, R-tree.

Page 3: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 3

Components of a Disk

Platters

The platters spin (say, 90rps).

Spindle

The arm assembly is moved in or out to position a head on a desired track. Tracks under heads make a cylinder (imaginary!).

Disk head

Arm movement

Arm assembly

Only one head reads/writes at any one time.

Tracks

Sector

Block size is a multiple of sector size (which is fixed).

Page 4: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4

Accessing a Disk Page

Time to access (read/write) a disk block: seek time (moving arms to position disk head on track) rotational delay (waiting for block to rotate under head) transfer time (actually moving data to/from disk surface)

Seek time and rotational delay dominate. Seek time varies from about 1 to 20msec Rotational delay varies from 0 to 10msec Transfer rate is about 1msec per 4KB page

Key to lower I/O cost: reduce seek/rotation delays! Hardware vs. software solutions?

Page 5: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 5

Index Types A primary index is an index which

controls the actual storage of a table. Typically this index is built using the primary key of the table. A data entry is one record of the table.

A secondary index is an index which is built using some other attribute(s). A data entry contains a set of RIDs.

Page 6: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 6

An Example3325 6632

0123

Donghui Zhang

33 617-373-2177

0222

James Bond

55 617-324-6685

2418

Betty Wu

21 951-666-8888332

5Jessica Greene

21

417-332-6659

3891

Frank Gates

33

433-222-1956

6632

Tim Varner

55 617-333-6782

9688

Stephanie Chen

33 951-243-0104

10

20

30

primary index

secondary index21: {10, 20}33: {10, 20, 30}55: {10, 30}

Page 7: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 7

Buffer Management in a DBMS

Data must be in RAM for DBMS to operate on it! For each page, dirty bit, pin count.

DB

MAIN MEMORY

DISK

disk page

free frame

Page Requests from Higher Levels

BUFFER POOL

choice of frame dictatedby replacement policy

Page 8: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 8

Page Formats: Variable Length Records

Can move records on page without changing rid; so, attractive for fixed-length records too.

Page iRid = (i,N)

Rid = (i,2)

Rid = (i,1)

Pointerto startof freespace

SLOT DIRECTORY

N . . . 2 120 16 24 N

# slots

Page 9: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 9

What we learned Storage (Chp 8 & 9) B+-tree (Chp 10) Hash index (Chp 11) External sort (Chp 13) Query evaluation Concurrency control (Chp 17) Recovery (Chp 18) Project: simple DBMS implentation,

NEUStore SB-tree, R-tree

Page 10: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 10

Example B+ Tree

2* 3*

Root

17

24 30

14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

135

7*5* 8*

Page 11: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 11

B+ Trees in Practice

Typical order: 100. Typical fill-factor: 67%. average fanout = 133

Can often hold top levels in buffer pool: Level 1 = 1 page = 8 KB Level 2 = 133 pages = 1 MB Level 3 = 17,689 pages = 145 MB Level 4 = 2,352,637 pages = 19 GB With 1 MB buffer, can locate one record

in 19 GB (or 0.3 billion records) in two I/Os!

Page 12: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 12

B+-tree index Structure Search Insert Delete

Bulk-loading a B+-tree Aggregation Query SB-tree

Page 13: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 13

What we learned Storage (Chp 8 & 9) B+-tree (Chp 10) Hash index (Chp 11) External sort (Chp 13) Query evaluation Concurrency control (Chp 17) Recovery (Chp 18) Project: simple DBMS implentation,

NEUStore SB-tree, R-tree

Page 14: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 14

Extendible Hashing Directory is array of size 4. To find bucket for r, take

last `global depth’ # bits of h(r); we denote r by h(r). If h(r) = 5 = binary 101,

it is in bucket pointed to by 01.

Insert: If bucket is full, split it (allocate new page, re-distribute).

If necessary, double the directory. (As we will see, splitting a bucket does not always require doubling; we can tell by comparing global depth with local depth for the split bucket.)

13*00

01

10

11

2

2

2

2

2

LOCAL DEPTH

GLOBAL DEPTH

DIRECTORY

Bucket A

Bucket B

Bucket C

Bucket D

DATA PAGES

10*

1* 21*

4* 12* 32* 16*

15* 7* 19*

5*

Page 15: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 15

Insert h(r)=20 (Causes Doubling)

20*

00

01

10

11

2 2

2

2

LOCAL DEPTH 2

2

DIRECTORY

GLOBAL DEPTHBucket A

Bucket B

Bucket C

Bucket D

Bucket A2(`split image'of Bucket A)

1* 5* 21*13*

32*16*

10*

15* 7* 19*

4* 12*

19*

2

2

2

000

001

010

011

100

101

110

111

3

3

3DIRECTORY

Bucket A

Bucket B

Bucket C

Bucket D

Bucket A2(`split image'of Bucket A)

32*

1* 5* 21*13*

16*

10*

15* 7*

4* 20*12*

LOCAL DEPTH

GLOBAL DEPTH

Page 16: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 16

Example of Linear Hashing

On split, hLevel+1 is used to re-distribute entries.

0hh

1

(This infois for illustrationonly!)

Level=0, N=4

00

01

10

11

000

001

010

011

(The actual contentsof the linear hashedfile)

Next=0

PRIMARYPAGES

Data entry rwith h(r)=5

Primary bucket page

44* 36*32*

25*9* 5*

14* 18*10*30*

31*35* 11*7*

0hh

1

Level=0

00

01

10

11

000

001

010

011

Next=1

PRIMARYPAGES

44* 36*

32*

25*9* 5*

14* 18*10*30*

31*35* 11*7*

OVERFLOWPAGES

43*

00100

Let hLevel=(a*key+b) % 4hLevel+1=(a*key+b) % 8

If insert 22……

Page 17: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 17

Example of Linear Hashing

0hh

1

Level=0

00

01

10

11

000

001

010

011

Next=2

PRIMARYPAGES

44* 36*

32*

25*9*

14* 18*10*30*

31*35* 11*7*

OVERFLOWPAGES

43*

00100

Let hLevel=(a*key+b) % 4hLevel+1=(a*key+b) % 8

22*

5*01101

Important notice: after buckets 010 and 011are split, Next moves back To 000!

Page 18: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 18

What we learned Storage (Chp 8 & 9) B+-tree (Chp 10) Hash index (Chp 11) External sort (Chp 13) Query evaluation Concurrency control (Chp 17) Recovery (Chp 18) Project: simple DBMS implentation,

NEUStore SB-tree, R-tree

Page 19: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 19

General External Merge Sort

To sort a file with N pages using B buffer pages: Pass 0: use B buffer pages. Produce sorted

runs of B pages each. Pass 2, …, etc.: merge B-1 runs.

N B/

B Main memory buffers

INPUT 1

INPUT B-1

OUTPUT

DiskDisk

INPUT 2

. . . . . .

. . .

More than 3 buffer pages. How can we utilize them?

Page 20: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 20

Cost of External Merge Sort

Number of passes: Cost = 2N * (# of passes) E.g., with 5 buffer pages, to sort 108 page

file: Pass 0: = 22 sorted runs of 5 pages

each (last run is only 3 pages) Pass 1: = 6 sorted runs of 20 pages

each (last run is only 8 pages) Pass 2: 2 sorted runs, 80 pages and 28 pages Pass 3: Sorted file of 108 pages

Note: for merge join, no need to write result.

1 1 log /B N B

108 5/

22 4/

Page 21: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 21

What we learned Storage (Chp 8 & 9) B+-tree (Chp 10) Hash index (Chp 11) External sort (Chp 13) Query evaluation Concurrency control (Chp 17) Recovery (Chp 18) Project: simple DBMS implentation,

NEUStore SB-tree, R-tree

Page 22: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 22

parse

convert

apply laws

estimate result sizes

consider physical plans estimate costs

pick best

execute

{P1,P2,…..}

{(P1,C1),(P2,C2)...}

Pi

answerSQL query

parse tree

logical query plan

“ improved” l.q.p

l.q.p. +sizes

statistics

query compiling

query optimization

query execution

Page 23: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 23

Query Compiling Convert

Input: a parse tree. Output: a logical query plan.

Algorithm: followed by . E.Name(E.SSN<5000 AND E.Age>50(E) ) Alternatively, a l.q.p tree.

SELECT E.NameFROM Emp EWHERE E.SSN<5000 AND E.Age>50

Emp E

E.SSN<5000 AND E.Age>50

E.Name

Page 24: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 24

Query Optimization Consider Physical Plans

Associate each RA operator with an implementation scheme. Multiple implementation schemes? Enumerate all.

SELECT E.NameFROM Emp EWHERE E.SSN<5000 AND E.Age>50

Emp E

E.SSN<5000 AND E.Age>50

E.Name

Plan 1 (always work!)

scan

on-the-fly

Page 25: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 25

Query Optimization Consider Physical Plans

SELECT E.NameFROM Emp EWHERE E.SSN<5000 AND E.Age>50

Emp E

E.SSN<5000 AND E.Age>50

E.Name

Plan 2

range search in SSN index

on-the-fly

Page 26: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 26

Query Optimization Consider Physical Plans

SELECT E.NameFROM Emp EWHERE E.SSN<5000 AND E.Age>50

Emp E

E.SSN<5000 AND E.Age>50

E.Name

Plan 3

range search in Age index, follow pointers to SSN index

on-the-fly

Page 27: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 27

Query Optimization Estimate Costs sample input

Assume for table E: Schema = (SSN: int, Name: string, Age: int, Salary: int) T(E) = 100 tuples. For attribute SSN:

• V(E, SSN)=100, min(E, SSN)=0000, max(E, SSN)=9999 For attribute Age:

• V(E, Age)=20, min(E, Age)=21, max(E, Age)=60 Primary index on SSN: 3 level B+-tree, 50 leaf nodes. Secondary index on Age: 2 level B+-tree, 10 leaf nodes, every

leaf entry points to 3.5 pageIDs (on average). Assumptions: all B+-tree roots are pinned. Can reach the

first leaf page of a B+-tree directly. Memory buffer size: 2 pages.

SELECT E.NameFROM Emp EWHERE E.SSN<5000 AND E.Age>50

Page 28: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 28

Query Optimization Estimate Costs

Cost = 50. (The primary index has 50 leaf nodes. Assume we can reach the first leaf page of a B+-tree directly.)

SELECT E.NameFROM Emp EWHERE E.SSN<5000 AND E.Age>50

Emp E

E.SSN<5000 AND E.Age>50

E.Name

Plan 1 (always work!)

scan

on-the-fly

Page 29: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 29

Query Optimization Estimate Costs

SELECT E.NameFROM Emp EWHERE E.SSN<5000 AND E.Age>50

Emp E

E.SSN<5000 AND E.Age>50

E.Name

Plan 2

range search in SSN index

on-the-fly

Cost = 25. SSN<5000 selects half of the employees, so 50/2=25 leaf nodes.

Note: if condition is E.SSN>5000, needs 1 more I/O.

Page 30: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 30

Query Optimization Estimate Costs

SELECT E.NameFROM Emp EWHERE E.SSN<5000 AND E.Age>50

Emp E

E.SSN<5000 AND E.Age>50

E.Name

Plan 3

range search in Age index, follow pointers to SSN index

on-the-fly

Cost = 10/4 + 20/4 * 3.5 = 21.

#I/Os in the Age index #I/Os in the SSN index

Page 31: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 31

Query Optimization Pick Best

SELECT E.NameFROM Emp EWHERE E.SSN<5000 AND E.Age>50

physical plan I/O cost

Plan 1: scan 50

Plan 2: range search SSN index 25

Plan 3: range search Age index 21

Pick!

Page 32: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 32

Query Compiling Convert

Algorithm: then then .

E.Name. D.Dname(E.Did=D.Did AND E.SSN<5000 AND D.budget=1000(ED) ) The l.q.p tree:

SELECT E.Name, D.DnameFROM Emp E, Dept DWHERE E.Did=D.Did AND E.SSN<5000 AND D.budget=1000

Emp E

E.Did=D.Did AND E.SSN<5000 AND D.budget=1000

E.Name, D.Dname

Dept D

Page 33: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 33

Query Compiling Apply Laws

Always always: (try to) replace with ! Also, push down.

SELECT E.Name, D.DnameFROM Emp E, Dept DWHERE E.Did=D.Did AND E.SSN<5000 AND D.budget=1000

Emp E

E.SSN<5000

E.Name, D.Dname

Dept D

D.budget=1000

Page 34: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 34

Four Join Algorithms

Iteration join (nested loop join) Merge join Hash join Join with index

Page 35: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 35

Example E D over common attribute Did

E: T(E)=10,000 primary index on SSN, 3 levels. |E|= 1,000 leaf nodes.

D: T(D)=5,000 primary index on Did. 3 levels. |D| = 500 leaf nodes.

Memory available = 101 blocks

Page 36: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 36

• Iteration JoinUse our memory(1) Read 100 blocks of D(2) Read all of E (using 1 block) + join(3) Repeat until done

• I/O cost = |D| + |D|/100 * |E| =

500 + 5*1000 = 5,500.

Page 37: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 37

Merge Join Cost

Recall that |E|=1000, |D|=500. And |D| is already sorted on Did.

External sort E: pass 0, by reading and writing E, produces a file with 10 sorted runs. Another read is enough.

No need to write! Can pipeline to join operator.

Cost = 3*1000 + 500 = 3,500.

Page 38: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 38

Simple example hash: even/odd

R1 R2 Buckets2 5 Even 4 4 R1 R23 12 Odd: 5 38 139 8

1114

2 4 8 4 12 8 14

3 5 9 5 3 13 11

Page 39: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 39

Hash Join Cost

Read + write both E and D for partitioning, then read to join.

Cost = 3 * (1000 + 500) = 4,500.

Page 40: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 40

Join with index (Conceptually)

For each r E do

Find the corresponding D tuple by probing index.

Assuming the root is pinned in memory,Cost = |E| + T(E)*2 = 1000 + 10,000*2 = 21,000.

Page 41: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 41

What we learned Storage (Chp 8 & 9) B+-tree (Chp 10) Hash index (Chp 11) External sort (Chp 13) Query evaluation Concurrency control (Chp 17) Recovery (Chp 18) Project: simple DBMS implentation,

NEUStore SB-tree, R-tree

Page 42: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 42

ACID Properties of Transactions

Atomicity: all actions are carried out, or none. Consistency: each transaction preserves the

consistency of the database if executed by itself. (The users make sure of this. E.g. transfer money…)

Isolation: transactions are isolated from the effect of concurrently scheduling other transactions.

Durability: the effect of a committed transaction should last, even if system crash before all changed are flushed to disk.

Page 43: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 43

Scheduling Transactions

Serial schedule: Schedule that does not interleave the actions of different transactions.

Equivalent schedules: For any database state, the effect (on the set of objects in the database) of executing the first schedule is identical to the effect of executing the second schedule.

Serializable schedule: A schedule that is equivalent to some serial execution of the transactions.

(Note: If each transaction preserves consistency, every serializable schedule preserves consistency. )

Page 44: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 44

Example

Conflict serializable serializable. Serializable conflict serializable

T1: R(A) W(A), CommitT2: W(A), CommitT3: W(A), Commit

T1 T2 precedence graph:1. a node for each

transaction2. an arc from Ti to Tj if an

action in Ti precedes andconflicts with an action inTj.

T3

Page 45: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 45

Deadlock Prevention

Assign priorities based on timestamps. Assume Ti wants a lock that Tj holds. Two policies are possible: Wait-Die: It Ti has higher priority, Ti waits

for Tj; otherwise Ti aborts Wound-wait: If Ti has higher priority, Tj

aborts; otherwise Ti waits If a transaction re-starts, make sure it

has its original timestamp

Page 46: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 46

Multiple-Granularity Locks

Why? If a transaction needs to scan all records in a table, do we really want to have a lock on all tuples individually? Significant locking overhead!

Put a single lock on the table!

Tuples

Tables

Pages

Database

contains

A lock on a nodeimplicitly locksall decendents.

Page 47: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 47

Allowed Sharings

-- IS IX

--

IS

IX

S X

S

X

Page 48: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 48

Tree Locking Algorithm

Search: Start at root and go down; repeatedly, S lock child then unlock parent.

Insert/Delete: Start at root and go down, obtaining X locks as needed. Once child is locked, check if it is safe: If child is safe, release all locks on ancestors.

Safe node: Node such that changes will not propagate up beyond this node. Inserts: Node is not full. Deletes: Node is not half-empty.

Page 49: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 49

Example

ROOT

A

B

C

D E

F

G H I

20

35

20*

38 44

22* 23* 24* 35* 36* 38* 41* 44*

Do:1) Search 38*2) Delete 38*3) Insert 45*4) Insert 25*

23

Page 50: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 50

Kung-Robinson Model

Xacts have three phases: READ: Xacts read from the database,

but make changes to private copies of objects.

VALIDATE: Check for conflicts. WRITE: Make local copies of changes

public.ROOT

old

new

modifiedobjects

Page 51: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 51

Timestamp CC

Idea: Give each object a read-timestamp (RTS) and a write-timestamp (WTS), give each Xact a timestamp (TS) when it begins: If action ai of Xact Ti conflicts with

action aj of Xact Tj, and TS(Ti) < TS(Tj), then ai must occur before aj. Otherwise, restart violating Xact.

Page 52: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 52

What we learned Storage (Chp 8 & 9) B+-tree (Chp 10) Hash index (Chp 11) External sort (Chp 13) Query evaluation Concurrency control (Chp 17) Recovery (Chp 18) Project: simple DBMS implentation,

NEUStore SB-tree, R-tree

Page 53: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 53

The Big Picture: What’s Stored Where

DB

Data pageseachwith apageLSN

Xact TablelastLSNstatus

Dirty Page TablerecLSN

flushedLSN

RAM

prevLSNXIDtype

lengthpageID

offsetbefore-imageafter-image

LogRecords

LOG

master record

Page 54: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 54

WAL & the Log

Each log record has a unique Log Sequence Number (LSN). LSNs always increasing.

Each data page contains a pageLSN. The LSN of the most recent log record

for an update to that page. System keeps track of flushedLSN.

The max LSN flushed so far. WAL: Before a page is written,

pageLSN flushedLSN

LSNs

DB

pageLSNs

RAM

flushedLSN

pageLSN

Log recordsflushed to disk

“ Log tail” in RAM

Page 55: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 55

Normal Execution of an Xact

Series of reads & writes, followed by commit or abort. We will assume that write is atomic on disk.

• In practice, additional details to deal with non-atomic writes.

Strict 2PL. STEAL, NO-FORCE buffer management, with

Write-Ahead Logging.

Page 56: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 56

Simple Transaction Abort

For now, consider an explicit abort of a Xact. No crash involved.

We want to “play back” the log in reverse order, UNDOing updates. Get lastLSN of Xact from Xact table. Can follow chain of log records backward via

the prevLSN field. Before starting UNDO, write an Abort log record.

• For recovering from crash during UNDO!

Page 57: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 57

Transaction Commit

Write commit record to log. All log records up to Xact’s lastLSN are

flushed. Guarantees that flushedLSN lastLSN. Note that log flushes are sequential,

synchronous writes to disk. Many log records per log page.

Commit() returns. Write end record to log.

Page 58: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 CSG131 Summary Donghui Zhang

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 58

Crash Recovery: Big Picture

Start from a checkpoint (found via master record).

Three phases. Need to:– Figure out which Xacts

committed since checkpoint, which failed (Analysis).

– REDO all actions. (repeat history)

– UNDO effects of failed Xacts.

Oldest log rec. of Xact active at crash

Smallest recLSN in dirty page table after Analysis

Last chkpt

CRASH

A R U