66
1 / 66 Multi-Version Concurrency Control Lecture 16: Multi-Version Concurrency Control

Lecture 16: Multi-Version Concurrency Control

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lecture 16: Multi-Version Concurrency Control

1 / 66

Multi-Version Concurrency Control

Lecture 16: Multi-Version Concurrency Control

Page 2: Lecture 16: Multi-Version Concurrency Control

2 / 66

Multi-Version Concurrency Control Recap

Recap

Page 3: Lecture 16: Multi-Version Concurrency Control

3 / 66

Multi-Version Concurrency Control Recap

Optimistic Concurrency Control

• The DBMS creates a private workspace for each txn.▶ Any object read is copied into workspace.▶ Modifications are applied to workspace.

• When a txn commits, the DBMS compares workspace write set to see whether itconflicts with other txns.

• If there are no conflicts, the write set is installed into the global database.

Page 4: Lecture 16: Multi-Version Concurrency Control

4 / 66

Multi-Version Concurrency Control Recap

OCC Phases

• Phase 1 – Read:▶ Track the read/write sets of txns and store their writes in a private workspace.

• Phase 2 – Validation:▶ When a txn commits, check whether it conflicts with other txns.

• Phase 3 – Write:▶ If validation succeeds, apply private changes to database. Otherwise abort and restart the

txn.

Page 5: Lecture 16: Multi-Version Concurrency Control

5 / 66

Multi-Version Concurrency Control Recap

Today’s Agenda

• Multi-Version Concurrency Control• Design Decisions

▶ Concurrency Control Protocol▶ Version Storage▶ Garbage Collection▶ Index Management

Page 6: Lecture 16: Multi-Version Concurrency Control

6 / 66

Multi-Version Concurrency Control Multi-Version Concurrency Control

Multi-Version Concurrency Control

Page 7: Lecture 16: Multi-Version Concurrency Control

7 / 66

Multi-Version Concurrency Control Multi-Version Concurrency Control

Multi-Version Concurrency Control

• The DBMS maintains multiple physical versions of a single logical object in thedatabase:▶ When a txn writes to an object, the DBMS creates a new version of that object (instead of

private workspace in OCC)▶ When a txn reads an object, it reads the newest version that existed when the txn started.

Page 8: Lecture 16: Multi-Version Concurrency Control

8 / 66

Multi-Version Concurrency Control Multi-Version Concurrency Control

MVCC HISTORY

• Protocol was first proposed in 1978 MIT PhD dissertation.• First implementations was Rdb/VMS and InterBase at DEC in early 1980s.

▶ Both were by Jim Starkey, co-founder of NuoDB.▶ DEC Rdb/VMS is now "Oracle Rdb"▶ InterBase was open-sourced as Firebird.

Page 9: Lecture 16: Multi-Version Concurrency Control

9 / 66

Multi-Version Concurrency Control Multi-Version Concurrency Control

Multi-Version Concurrency Control

• Writers don’t block readers. Readers don’t block writers.• Read-only txns can read a consistent snapshot without acquiring locks.

▶ Use timestamps to determine visibility.

• Easily support time-travel queries.

Page 10: Lecture 16: Multi-Version Concurrency Control

10 / 66

Multi-Version Concurrency Control Multi-Version Concurrency Control

MVCC – Example 1

Page 11: Lecture 16: Multi-Version Concurrency Control

11 / 66

Multi-Version Concurrency Control Multi-Version Concurrency Control

MVCC – Example 1

Page 12: Lecture 16: Multi-Version Concurrency Control

12 / 66

Multi-Version Concurrency Control Multi-Version Concurrency Control

MVCC – Example 1

Page 13: Lecture 16: Multi-Version Concurrency Control

13 / 66

Multi-Version Concurrency Control Multi-Version Concurrency Control

MVCC – Example 1

Page 14: Lecture 16: Multi-Version Concurrency Control

14 / 66

Multi-Version Concurrency Control Multi-Version Concurrency Control

MVCC – Example 1

Page 15: Lecture 16: Multi-Version Concurrency Control

15 / 66

Multi-Version Concurrency Control Multi-Version Concurrency Control

MVCC – Example 2

Page 16: Lecture 16: Multi-Version Concurrency Control

16 / 66

Multi-Version Concurrency Control Multi-Version Concurrency Control

MVCC – Example 2

Page 17: Lecture 16: Multi-Version Concurrency Control

17 / 66

Multi-Version Concurrency Control Multi-Version Concurrency Control

MVCC – Example 2

Page 18: Lecture 16: Multi-Version Concurrency Control

18 / 66

Multi-Version Concurrency Control Multi-Version Concurrency Control

MVCC – Example 2

Page 19: Lecture 16: Multi-Version Concurrency Control

19 / 66

Multi-Version Concurrency Control Multi-Version Concurrency Control

MVCC – Example 2

Page 20: Lecture 16: Multi-Version Concurrency Control

20 / 66

Multi-Version Concurrency Control Multi-Version Concurrency Control

MVCC – Example 2

Page 21: Lecture 16: Multi-Version Concurrency Control

21 / 66

Multi-Version Concurrency Control Multi-Version Concurrency Control

MVCC – Example 2

Page 22: Lecture 16: Multi-Version Concurrency Control

22 / 66

Multi-Version Concurrency Control Multi-Version Concurrency Control

Multi-Version Concurrency Control

• MVCC is more than just a Concurrency Control protocol.• It completely affects how the DBMS manages transactions and the database.• Examples: Oracle, SAP HANA, PostgreSQL, CockroachDB

Page 23: Lecture 16: Multi-Version Concurrency Control

23 / 66

Multi-Version Concurrency Control Multi-Version Concurrency Control

MVCC Design Decisions

• Concurrency Control Protocol• Version Storage• Garbage Collection• Index Management

Page 24: Lecture 16: Multi-Version Concurrency Control

24 / 66

Multi-Version Concurrency Control Concurrency Control Protocol

Concurrency Control Protocol

Page 25: Lecture 16: Multi-Version Concurrency Control

25 / 66

Multi-Version Concurrency Control Concurrency Control Protocol

Concurrency Control Protocol

• Approach 1: Timestamp Ordering▶ Assign txns timestamps that determine serial order.

• Approach 2: Optimistic Concurrency Control▶ Three-phase protocol from last class.▶ Use private workspace for new versions.

• Approach 3: Two-Phase Locking▶ Txns acquire appropriate lock on physical version before they can read/write a logical

tuple.

Page 26: Lecture 16: Multi-Version Concurrency Control

26 / 66

Multi-Version Concurrency Control Version Storage

Version Storage

Page 27: Lecture 16: Multi-Version Concurrency Control

27 / 66

Multi-Version Concurrency Control Version Storage

Version Storage

• The DBMS uses the tuples’ pointer field to create a version chain per logical tuple.▶ This allows the DBMS to find the version that is visible to a particular txn at runtime.▶ Indexes always point to the head of the chain.

• Different storage schemes determine where/what to store for each version.

Page 28: Lecture 16: Multi-Version Concurrency Control

28 / 66

Multi-Version Concurrency Control Version Storage

Version Storage

• Approach 1: Append-Only Storage▶ New versions are appended to the same table space.

• Approach 2: Time-Travel Storage▶ Old versions are copied to separate table space.

• Approach 3: Delta Storage▶ The original values of the modified attributes are copied into a separate delta record space.

Page 29: Lecture 16: Multi-Version Concurrency Control

29 / 66

Multi-Version Concurrency Control Version Storage

Append-Only Storage• All of the physical versions of a logical tuple are stored in the same table space. The

versions are mixed together.• On every update, append a new version of the tuple into an empty space in the table.

Page 30: Lecture 16: Multi-Version Concurrency Control

30 / 66

Multi-Version Concurrency Control Version Storage

Append-Only Storage• All of the physical versions of a logical tuple are stored in the same table space. The

versions are mixed together.• On every update, append a new version of the tuple into an empty space in the table.

Page 31: Lecture 16: Multi-Version Concurrency Control

31 / 66

Multi-Version Concurrency Control Version Storage

Append-Only Storage• All of the physical versions of a logical tuple are stored in the same table space. The

versions are mixed together.• On every update, append a new version of the tuple into an empty space in the table.

Page 32: Lecture 16: Multi-Version Concurrency Control

32 / 66

Multi-Version Concurrency Control Version Storage

Version Chain Ordering

• Approach 1: Oldest-to-Newest (O2N)▶ Just append new version to end of the chain.▶ Have to traverse chain on look-ups.

• Approach 2: Newest-to-Oldest (N2O)▶ Have to update index pointers for every new version.▶ Don’t have to traverse chain on look ups.

Page 33: Lecture 16: Multi-Version Concurrency Control

33 / 66

Multi-Version Concurrency Control Version Storage

Time-Travel Storage

• On every update, copy the current version to the time-travel table. Update pointers.• Overwrite master version in the main table. Update pointers.

Page 34: Lecture 16: Multi-Version Concurrency Control

34 / 66

Multi-Version Concurrency Control Version Storage

Time-Travel Storage

• On every update, copy the current version to the time-travel table. Update pointers.• Overwrite master version in the main table. Update pointers.

Page 35: Lecture 16: Multi-Version Concurrency Control

35 / 66

Multi-Version Concurrency Control Version Storage

Time-Travel Storage

• On every update, copy the current version to the time-travel table. Update pointers.• Overwrite master version in the main table. Update pointers.

Page 36: Lecture 16: Multi-Version Concurrency Control

36 / 66

Multi-Version Concurrency Control Version Storage

Time-Travel Storage

• On every update, copy the current version to the time-travel table. Update pointers.• Overwrite master version in the main table. Update pointers.

Page 37: Lecture 16: Multi-Version Concurrency Control

37 / 66

Multi-Version Concurrency Control Version Storage

Delta Storage

• On every update, copy only the values that were modified to the delta storage andoverwrite the master version.

• Txns can recreate old versions by applying the delta in reverse order.

Page 38: Lecture 16: Multi-Version Concurrency Control

38 / 66

Multi-Version Concurrency Control Version Storage

Delta Storage

• On every update, copy only the values that were modified to the delta storage andoverwrite the master version.

• Txns can recreate old versions by applying the delta in reverse order.

Page 39: Lecture 16: Multi-Version Concurrency Control

39 / 66

Multi-Version Concurrency Control Version Storage

Delta Storage

• On every update, copy only the values that were modified to the delta storage andoverwrite the master version.

• Txns can recreate old versions by applying the delta in reverse order.

Page 40: Lecture 16: Multi-Version Concurrency Control

40 / 66

Multi-Version Concurrency Control Version Storage

Delta Storage

• On every update, copy only the values that were modified to the delta storage andoverwrite the master version.

• Txns can recreate old versions by applying the delta in reverse order.

Page 41: Lecture 16: Multi-Version Concurrency Control

41 / 66

Multi-Version Concurrency Control Garbage Collection

Garbage Collection

Page 42: Lecture 16: Multi-Version Concurrency Control

42 / 66

Multi-Version Concurrency Control Garbage Collection

Garbage Collection

• The DBMS needs to remove reclaimable physical versions from the database overtime.▶ No active txn in the DBMS can see that version (SI).▶ The version was created by an aborted txn.

• Two additional design decisions:▶ How to look for expired versions?▶ How to decide when it is safe to reclaim memory?

Page 43: Lecture 16: Multi-Version Concurrency Control

43 / 66

Multi-Version Concurrency Control Garbage Collection

Garbage Collection

• Approach 1: Tuple-level▶ Find old versions by examining tuples directly.▶ Background Vacuuming vs. Cooperative Cleaning

• Approach 2: Transaction-level▶ Txns keep track of their old versions so the DBMS does not have to scan tuples to

determine visibility.

Page 44: Lecture 16: Multi-Version Concurrency Control

44 / 66

Multi-Version Concurrency Control Garbage Collection

Tuple-level GC

• Background Vacuuming:• Separate thread(s) periodically scan the table and look for reclaimable versions.• Works with any storage.

Page 45: Lecture 16: Multi-Version Concurrency Control

45 / 66

Multi-Version Concurrency Control Garbage Collection

Tuple-level GC

Page 46: Lecture 16: Multi-Version Concurrency Control

46 / 66

Multi-Version Concurrency Control Garbage Collection

Tuple-level GC

Page 47: Lecture 16: Multi-Version Concurrency Control

47 / 66

Multi-Version Concurrency Control Garbage Collection

Tuple-level GC

Page 48: Lecture 16: Multi-Version Concurrency Control

48 / 66

Multi-Version Concurrency Control Garbage Collection

Tuple-level GC

Page 49: Lecture 16: Multi-Version Concurrency Control

49 / 66

Multi-Version Concurrency Control Garbage Collection

Tuple-level GC

• Cooperative Cleaning:• Worker threads identify reclaimable versions as they traverse version chain.• Only works with O2N.

Page 50: Lecture 16: Multi-Version Concurrency Control

50 / 66

Multi-Version Concurrency Control Garbage Collection

Tuple-level GC

Page 51: Lecture 16: Multi-Version Concurrency Control

51 / 66

Multi-Version Concurrency Control Garbage Collection

Tuple-level GC

Page 52: Lecture 16: Multi-Version Concurrency Control

52 / 66

Multi-Version Concurrency Control Garbage Collection

Tuple-level GC

Page 53: Lecture 16: Multi-Version Concurrency Control

53 / 66

Multi-Version Concurrency Control Garbage Collection

Tuple-level GC

Page 54: Lecture 16: Multi-Version Concurrency Control

54 / 66

Multi-Version Concurrency Control Garbage Collection

Transaction-level GC

• Each txn keeps track of its read/write set.• The DBMS determines when all versions created by a finished txn are no longer visible.

Page 55: Lecture 16: Multi-Version Concurrency Control

55 / 66

Multi-Version Concurrency Control Index Management

Index Management

Page 56: Lecture 16: Multi-Version Concurrency Control

56 / 66

Multi-Version Concurrency Control Index Management

Index Management

• Primary key indexes point to version chain head.▶ How often the DBMS has to update the pkey index depends on whether the system

creates new versions when a tuple is updated.▶ If a txn updates a tuple’s pkey attribute(s), then this is treated as an DELETE followed by

an INSERT.

• Secondary indexes are more complicated. . .

Page 57: Lecture 16: Multi-Version Concurrency Control

57 / 66

Multi-Version Concurrency Control Index Management

Secondary Indexes

• Approach 1: Physical Pointers▶ Use the physical address to the version chain head.

• Approach 2: Logical Pointers▶ Use a fixed identifier per tuple that does not change.▶ Requires an extra indirection layer.▶ Primary Key vs. Tuple Id

Page 58: Lecture 16: Multi-Version Concurrency Control

58 / 66

Multi-Version Concurrency Control Index Management

Physical Pointers

Page 59: Lecture 16: Multi-Version Concurrency Control

59 / 66

Multi-Version Concurrency Control Index Management

Physical Pointers

Page 60: Lecture 16: Multi-Version Concurrency Control

60 / 66

Multi-Version Concurrency Control Index Management

Physical Pointers

Page 61: Lecture 16: Multi-Version Concurrency Control

61 / 66

Multi-Version Concurrency Control Index Management

Physical Pointers

Page 62: Lecture 16: Multi-Version Concurrency Control

62 / 66

Multi-Version Concurrency Control Index Management

Logical Pointers

Page 63: Lecture 16: Multi-Version Concurrency Control

63 / 66

Multi-Version Concurrency Control Index Management

Logical Pointers

Page 64: Lecture 16: Multi-Version Concurrency Control

64 / 66

Multi-Version Concurrency Control Index Management

MVCC Implementations

DBMS Protocol Version Storage Garbage Collection Indexes

Oracle MV2PL Delta Vacuum LogicalPostgres MV-2PL/MV-TO Append-Only Vacuum PhysicalMySQL-InnoDB MV-2PL Delta Vacuum LogicalHYRISE MV-OCC Append-Only – PhysicalHekaton MV-OCC Append-Only Cooperative PhysicalMemSQL MV-OCC Append-Only Vacuum PhysicalSAP HANA MV-2PL Time-travel Hybrid LogicalNuoDB MV-2PL Append-Only Vacuum LogicalHyPer MV-OCC Delta Txn-level Logical

Page 65: Lecture 16: Multi-Version Concurrency Control

65 / 66

Multi-Version Concurrency Control Index Management

Conclusion

• MVCC is the widely used scheme in DBMSs.• Even systems that do not support multi-statement txns (e.g., NoSQL) use it.

Page 66: Lecture 16: Multi-Version Concurrency Control

66 / 66

Multi-Version Concurrency Control Index Management

Next Class

• Advanced topics in Concurrency Control