DATABASE SYSTEM IMPLEMENTATIONjarulraj/courses/4420-s19/slides/10... · A data structure that improves the speed of data retrieval operations on a table at the cost of additional

DATABASE SYSTEM IMPLEMENTATION

GT 4420/6422 // SPRING 2019 // @JOY_ARULRAJ

LECTURE #9: INDEX LOCKING & LATCHING

ANATOMY OF A DATABASE SYSTEM

Connection Manager + Admission Control

Query Parser

Query Optimizer

Query Executor

Lock Manager (Concurrency Control)

Access Methods (or Indexes)

Buffer Pool Manager

Log Manager

Memory Manager + Disk Manager

Networking Manager

2

QueryTransactional

Storage Manager

Query Processor

Shared Utilities

Process Manager

Source: Anatomy of a Database System

http://cs.brown.edu/courses/cs295-11/2006/anatomyofadatabase.pdf

TODAY'S AGENDA

Index Locks vs. LatchesLatch ImplementationsIndex Latching (Physical)Index Locking (Logical)

3

DATABASE INDEX

A data structure that improves the speed of data retrieval operations on a table at the cost of additional writes and storage space.→ Example: Glossary in a tech-book

Indexes are used to quickly locate data without having to search every row in a table every time a table is accessed.

4

DATA STRUCTURES

Order Preserving Indexes→ A tree-like structure that maintains keys in some sorted

order.→ Supports all possible predicates with O(log n) searches.→ Example: AGE > 20 AND AGE < 30Hashing Indexes→ An associative array that maps a hash of the key to a

particular record.→ Only supports equality predicates with O(1) searches.→ Example: AGE = 20

5

DATA STRUCTURES

Order Preserving Indexes→ A tree-like structure that maintains keys in some sorted

order.→ Supports all possible predicates with O(log n) searches.→ Example: AGE > 20 AND AGE < 30Hashing Indexes→ An associative array that maps a hash of the key to a

particular record.→ Only supports equality predicates with O(1) searches.→ Example: AGE = 20

6

B-TREE VS. B+TREE

The original B-tree from 1972 stored keys + values in all nodes in the tree.→ More memory efficient since each key only appears once

in the tree.

A B+tree only stores values in leaf nodes. Inner nodes only guide the search process.→ Easier to manage concurrent index access when the

values are only in the leaf nodes.

7

Reference: Differences betwee B-Tree and B+Tree

https://stackoverflow.com/questions/870218/differences-between-b-trees-and-b-trees

OBSERVATION

We already know how to use locks to protect objects in the database.

But we have to treat indexes differently because the physical structure can change as long as the logical contents are consistent.

8

SIMPLE EXAMPLE

9

A20 22

SIMPLE EXAMPLE

10

A20 22

Txn #1: Read ‘22’

SIMPLE EXAMPLE

11

A20 22 Txn #2: Insert ‘21’Txn #1: Read ‘22’

SIMPLE EXAMPLE

12

A20 22

B20 22 C

Txn #2: Insert ‘21’Txn #1: Read ‘22’

SIMPLE EXAMPLE

13

A20 22

B20 22 C

Txn #2: Insert ‘21’21

21 22


SIMPLE EXAMPLE

14

A20 22

B20 22 C

Txn #2: Insert ‘21’21


21 22


LOCKS VS. LATCHES

Locks→ Protects the index’s logical contents from other txns.→ Held for txn duration.→ Need to be able to rollback changes.

Latches→ Protects the critical sections of the index’s internal data

structure from other threads.→ Held for operation duration.→ Do not need to be able to rollback changes.

15

A SURVEY OF B-TREE LOCKING TECHNIQUESTODS 2010

LATCH IMPLEMENTATIONS

Blocking OS MutexTest-and-Set SpinlockQueue-based SpinlockReader-Writer Locks

16

Source: Anastasia Ailamaki

http://pandis.net/resources/shoremt-tutorial-vldb13.pptx

COMPARE-AND-SWAP INSTRUCTION

Atomic instruction that compares contents of a memory location M to a given value V→ If values are equal, installs new given value V’ in M→ Otherwise operation fails

17

M__sync_bool_compare_and_swap(&M, 20, 30)20

Compare Value

AddressNew

Value

COMPARE-AND-SWAP INSTRUCTION

Atomic instruction that compares contents of a memory location M to a given value V→ If values are equal, installs new given value V’ in M→ Otherwise operation fails

18

M__sync_bool_compare_and_swap(&M, 20, 30)30

Compare Value

AddressNew

Value


Choice #1: Blocking OS Mutex→ Simple to use→ Non-scalable (about 25ns per lock/unlock invocation)→ Example: std::mutex

19



20

pthread_mutex_t



21

std::mutex m;⋮

m.lock();// Do something special...m.unlock();

pthread_mutex_t



22

std::mutex m;⋮


pthread_mutex_t



23

std::mutex m;⋮


pthread_mutex_t


Choice #2: Test-and-Set Spinlock (TAS)→ Very efficient (single instruction to lock/unlock)→ Not OS-dependent (unlike mutex)→ Non-scalable, not cache friendly, busy-waiting→ Example: std::atomic<T>

24



25

std::atomic_flag latch;⋮

while (latch.test_and_set(…)) {// Yield? Abort? Retry?

}



26



}

std::atomic<bool>



27



}

std::atomic<bool>



28



}

std::atomic<bool>


Choice #3: Queue-based Spinlock (MCS)→ More efficient than mutex, better cache locality→ With TAS, all threads are polling a single shared latch variable→ With MCS, each thread is polling on predecessor in queue→ Example: std::atomic<Latch*>

29



30

Mellor-Crummey and Scott



31

next

Base Latch




32

next

Base Latch

CPU1




33

next

Base Latch

next

CPU1 Latch

CPU1




34

next

Base Latch

next

CPU1 Latch

CPU1




35

next

Base Latch

next

CPU1 Latch

CPU1 CPU2




36

next

Base Latch

next

CPU1 Latch

CPU1 CPU2




37

next

Base Latch

next

CPU1 Latch

CPU1 CPU2




38

next

Base Latch

next

CPU1 Latch

next

CPU2 Latch

CPU1 CPU2




39

next

Base Latch

next

CPU1 Latch

next

CPU2 Latch

CPU1 CPU2 CPU3




40

next

Base Latch

next

CPU1 Latch

next

CPU2 Latch

CPU1 CPU2 CPU3




41

next

Base Latch

next

CPU1 Latch

next

CPU2 Latch

CPU1 CPU2 CPU3



Choice #4: Reader-Writer Locks→ Allows for concurrent readers→ Have to manage read/write queues to avoid starvation→ Can be implemented on top of spinlocks

42



43

read write

Latch

=0=0

=0=0



44

read write

Latch

=0=0

=0=0



45

read write

Latch

=0=0

=0=0



46

read write

Latch

=0=0

=0=0

=1



47

read write

Latch

=0=0

=0=0

=1



48

read write

Latch

=0=0

=0=0

=1=2



49

read write

Latch

=0=0

=0=0

=1=2



50

read write

Latch

=0=0

=0=0

=1=2=1



51

read write

Latch

=0=0

=0=0

=1=2=1



52

read write

Latch

=0=0

=0=0

=1=2=1=1

LATCH CRABBING

Acquire and release latches on B+Tree nodes when traversing the data structure.

A thread can release latch on a parent node if its child node considered safe.→ Any node that won’t split or merge when updated.→ Not full (on insertion)→ More than half-full (on deletion)

53

LATCH CRABBING

Search: Start at root and go down; repeatedly,→ Acquire read (R) latch on child→ Then unlock the parent node.

Insert/Delete: Start at root and go down, obtaining write (W) latches as needed.Once child is locked, check if it is safe:→ If child is safe, release all locks on ancestors.

54

EXAMPLE #1: SEARCH 23

55

A

B

D G

20

10 35

6 12 23 38 44

C

E F


56

A

B

D G

20

10 35

6 12 23 38 44

C

E F


57

A

B

D G

20

10 35

6 12 23 38 44

C

E F

R


58

A

B

D G

20

10 35

6 12 23 38 44

C

E F

R

R


59

A

B

D G

20

10 35

6 12 23 38 44

C

E F

R

R

We can release the latch on A as soon as we acquire the latch for C.


60

A

B

D G

20

10 35

6 12 23 38 44

C

E F

R



61

A

B

D G

20

10 35

6 12 23 38 44

C

E F

R

R



62

A

B

D G

20

10 35

6 12 23 38 44

C

E FR


EXAMPLE #2: DELETE 44

63

A

B

D G

20

10 35

6 12 23 38 44

C

E F


64

A

B

D G

20

10 35

6 12 23 38 44

C

E F

W


65

A

B

D G

20

10 35

6 12 23 38 44

C

E F

W

W


66

A

B

D G

20

10 35

6 12 23 38 44

C

E F

W

W

We may need to coalesce C, so we can’t release the latch on A.


67

A

B

D G

20

10 35

6 12 23 38 44

C

E F

W

W

W

We may need to coalesce C, so we can’t release the latch on A.


68

A

B

D G

20

10 35

6 12 23 38 44

C

E F

W

W

W

We may need to coalesce C, so we can’t release the latch on A.G will not merge with F, so we can release latches on A and C.


69

A

B

D G

20

10 35

6 12 23 38 44

C

E FW



70

A

B

D G

20

10 35

6 12 23 38 44

C

E FW


X

EXAMPLE #3: INSERT 40

71

A

B

D G

20

10 35

6 12 23 38 44

C

E F


72

A

B

D G

20

10 35

6 12 23 38 44

C

E F

W


73

A

B

D G

20

10 35

6 12 23 38 44

C

E F

W

W


74

A

B

D G

20

10 35

6 12 23 38 44

C

E F

W

W

C has room if its child has to split, so we can release the latch on A.


75

A

B

D G

20

10 35

6 12 23 38 44

C

E F

W

C has room if its child has to split, so we can release the latch on A.


76

A

B

D G

20

10 35

6 12 23 38 44

C

E F

W

W

C has room if its child has to split, so we can release the latch on A.G has to split, so we can’t release the latch on C.


77

A

B

D G

20

10 35

6 12 23 38 44

C

E F

W

W


H44


78

A

B

D G

20

10 35

6 12 23 38 44

C

E F

W

W


H4440

44


79

A

B

D G

20

10 35

6 12 23 38 44

C

E F


H4440

44

OBSERVATION

What was the first step that the DBMS took in the two examples that updated the index?

80

OBSERVATION

What was the first step that the DBMS took in the two examples that updated the index?

81

Delete 44

A20W

Insert 40

A20W

BETTER LATCH CRABBING

Optimistically assume that the leaf is safe.→ Take R latches as you traverse the tree to reach it and

verify.→ If leaf is not safe, then do previous algorithm.

Also called optimistic lock coupling.

82

CONCURRENCY OF OPERATIONS ON B-TREESActa Informatica 9: 1-21 1977


83

A

B

D G

20

10 35

6 12 23 38 44

C

E F


84

A

B

D G

20

10 35

6 12 23 38 44

C

E F

R


85

A

B

D G

20

10 35

6 12 23 38 44

C

E F

R

R

We assume that C is safe, so we can release the latch on A.


86

A

B

D G

20

10 35

6 12 23 38 44

C

E F

R

We assume that C is safe, so we can release the latch on A.


87

A

B

D G

20

10 35

6 12 23 38 44

C

E F

R

We assume that C is safe, so we can release the latch on A.Acquire an exclusive latch on G.


88

A

B

D G

20

10 35

6 12 23 38 44

C

E FW



89

A

B

D G

20

10 35

6 12 23 38 44

C

E FW


X

OBSERVATION

Crabbing ensures that txns do not corrupt the internal data structure during modifications.

But because txns release latches on each node as soon as they are finished their operations, we cannot guarantee that phantoms do not occur…

Latches only protect the physical contents of the data structure, not the logical contents.

90

PROBLEM SCENARIO #1

91

A

B

D G

20

10 35

6 12 23 38 44

C

E F

PROBLEM SCENARIO #1

92

A

B

D G

20

10 35

6 12 23 38 44

C

E F

Txn #1: Check if 25 exists

PROBLEM SCENARIO #1

93

A

B

D G

20

10 35

6 12 23 38 44

C

E F

Txn #1: Check if 25 existsR

PROBLEM SCENARIO #1

94

A

B

D G

20

10 35

6 12 23 38 44

C

E F

Txn #1: Check if 25 existsR

R

PROBLEM SCENARIO #1

95

A

B

D G

20

10 35

6 12 23 38 44

C

E F


R

PROBLEM SCENARIO #1

96

A

B

D G

20

10 35

6 12 23 38 44

C

E F


R!

PROBLEM SCENARIO #1

97

A

B

D G

20

10 35

6 12 23 38 44

C

E F

Txn #1: Check if 25 existsTxn #2: Insert 25

PROBLEM SCENARIO #1

98

A

B

D G

20

10 35

6 12 23 38 44

C

E F


W

PROBLEM SCENARIO #1

99

A

B

D G

20

10 35

6 12 23 38 44

C

E F


25W

PROBLEM SCENARIO #1

100

A

B

D G

20

10 35

6 12 23 38 44

C

E F

Txn #1: Check if 25 existsTxn #2: Insert 25Txn #1: Insert 25

25

PROBLEM SCENARIO #1

101

A

B

D G

20

10 35

6 12 23 38 44

C

E F

Txn #1: Check if 25 existsTxn #2: Insert 25Txn #1: Insert 25

25W

PROBLEM SCENARIO #2

102

A

B

D G

20

10 35

6 12 23 38 44

C

E F

PROBLEM SCENARIO #2

103

A

B

D G

20

10 35

6 12 23 38 44

C

E F

Txn #1: Scan [12, 23]

PROBLEM SCENARIO #2

104

A

B

D G

20

10 35

6 12 23 38 44

C

E F

Txn #1: Scan [12, 23]

R

PROBLEM SCENARIO #2

105

A

B

D G

20

10 35

6 12 23 38 44

C

E F

Txn #1: Scan [12, 23]

R R

PROBLEM SCENARIO #2

106

A

B

D G

20

10 35

6 12 23 38 44

C

E F

Txn #1: Scan [12, 23]Txn #2: Insert 21

PROBLEM SCENARIO #2

107

A

B

D G

20

10 35

6 12 23 38 44

C

E F


W

W

PROBLEM SCENARIO #2

108

A

B

D G

20

10 35

6 12 23 38 44

C

E F


2321

W

W

PROBLEM SCENARIO #2

109

A

B

D G

20

10 35

6 12 23 38 44

C

E F


2321

PROBLEM SCENARIO #2

110

A

B

D G

20

10 35

6 12 23 38 44

C

E F


2321

Txn #1: Scan [12, 23]

PROBLEM SCENARIO #2

111

A

B

D G

20

10 35

6 12 23 38 44

C

E F


2321

Txn #1: Scan [12, 23]

R R

INDEX LOCKS

Need a way to protect the index’s logical contents from other txns to avoid phantoms.

Difference with index latches:→ Locks are held for the entire duration of a txn.→ Only acquired at the leaf nodes.→ Not physically stored in index data structure.

112

INDEX LOCKS

113

Lock Table

txn1X

txn2S

txn3S • • •

txn3S

txn2S

txn4S • • •

txn4IX

txn6X

txn5S • • •

INDEX LOCKS

114

Lock Table

txn1X

txn2S

txn3S • • •

txn3S

txn2S

txn4S • • •

txn4IX

txn6X

txn5S • • •

INDEX LOCKS

115

Lock Table

txn1X

txn2S

txn3S • • •

txn3S

txn2S

txn4S • • •

txn4IX

txn6X

txn5S • • •

INDEX LOCKING SCHEMES

Predicate LocksKey-Value LocksGap LocksKey-Range LocksHierarchical Locking

116

PREDICATE LOCKS

Proposed locking scheme from System R.→ Shared lock on the predicate in a WHERE clause of a

SELECT query.→ Exclusive lock on the predicate in a WHERE clause of any

UPDATE, INSERT, or DELETE query.

Never implemented in any system.

117

THE NOTIONS OF CONSISTENCY AND PREDICATE LOCKS IN A DATABASE SYSTEMCACM 1976

PREDICATE LOCKS

118

SELECT SUM(balance)FROM accountWHERE name = 'Biggie'

INSERT INTO account(name, balance)VALUES ('Biggie', 100);

Records in Table "account"

PREDICATE LOCKS

119




PREDICATE LOCKS

120



name='Biggie'


PREDICATE LOCKS

121



name='Biggie'

name='Biggie'∧balance=100


KEY-VALUE LOCKS

Locks that cover a single key value.Need “virtual keys” for non-existent values.

122

10 12 14 16

B+Tree Leaf Node

KEY-VALUE LOCKS

Locks that cover a single key value.Need “virtual keys” for non-existent values.

123

10 12 14 16

B+Tree Leaf NodeKey

[14, 14]

GAP LOCKS

Each txn acquires a key-value lock on the single key that it wants to access. Then get a gap lock on the next key gap.

124

10 12 14 16

B+Tree Leaf Node

GAP LOCKS


125

10 12 14 16{Gap}{Gap} {Gap}

B+Tree Leaf Node

GAP LOCKS


126

10 12 14 16{Gap}{Gap} {Gap}

B+Tree Leaf Node

Gap(14, 16)

KEY-RANGE LOCKS

A txn takes locks on ranges in the key space.→ Each range is from one key that appears in the relation,

to the next that appears.→ Define lock modes so conflict table will capture

commutativity of the operations available.→ Example: Intention locks (IX, IS)

127

KEY-RANGE LOCKS

Locks that cover a key value and the gap to the next key value in a single index.→ Need “virtual keys” for artificial values (infinity)

128

10 12 14 16{Gap}{Gap} {Gap}

B+Tree Leaf Node

KEY-RANGE LOCKS


129

10 12 14 16{Gap}{Gap} {Gap}

B+Tree Leaf Node

Next Key [14, 16)

KEY-RANGE LOCKS


130

10 12 14 16{Gap}{Gap} {Gap}

B+Tree Leaf Node

Prior Key (12, 14]

HIERARCHICAL LOCKING

Allow for a txn to hold wider key-range locks with different locking modes.→ Reduces the number of visits to lock manager.

131

10 12 14 16{Gap}{Gap} {Gap}

B+Tree Leaf Node



132

10 12 14 16{Gap}{Gap} {Gap}

B+Tree Leaf NodeIX

[10, 16)



133

10 12 14 16{Gap}{Gap} {Gap}

B+Tree Leaf NodeIX

[10, 16)

[14, 16)X



134

10 12 14 16{Gap}{Gap} {Gap}

B+Tree Leaf NodeIX

[10, 16)

[14, 16)XIX [12, 12]X

LOCK-FREE INDEXES

Possibility #1: No Locks→ Txns don’t acquire locks to access/modify database.→ Still have to use latches to install updates.

Possibility #2: No Latches→ Swap pointers using atomic updates to install changes.→ Still have to use locks to validate txns.

135

PARTING THOUGHTS

Hierarchical locking essentially provides predicate locking without complications.→ Index locking occurs only in the leaf nodes.→ Latching is to ensure consistent data structure.

Supporting higher levels of Isolation (e.g., serializable isolation) requires the usage of a locking protocol that can avoid phantoms.

136

NEXT CLASS

Index Key RepresentationMemory Allocation & Garbage CollectionT-Trees (1980s / TimesTen)Bw-Tree (Hekaton)Concurrent Skip Lists (MemSQL)

137

Documents

DATABASE SYSTEM IMPLEMENTATIONjarulraj/courses/4420-s19/slides/10... · A data structure that improves the speed of data retrieval operations on a table at the cost of additional