View
231
Download
2
Category
Preview:
Citation preview
Optimized Transaction Time Versioning Inside a Database Engine
Intern: Feifei Li, Boston University
Mentor: David Lomet, MSR
Transaction Time Support Provide access to prior states of a database:
Auditing the database Querying the historical data Mining the pattern of changes to a database
General approach: Build it outside the database engine Build it inside the database engine
Overview of A Versioned Database
Page Header
0Dynamic Slot Array
Record A
Record B
1
A.1
A.0
B.1
B.2
B.0
Key Challenges Timestamping
Eager timestamping vs. lazy timestamping Record takes the transaction commit timestamp
Recovery of timestamping information when system crashes
Indexing both current versions and historical versions simultaneously Storage utilization Query efficiency
Talk Outline Even “lazier” timestamping Deferred-key-split policy in the TSB tree Auditing the database
Talk Outline Even “lazier” timestamping Deferred-key-split policy in the TSB tree Auditing the database
Lazy Timestamping When do we timestamp records affected by a
transaction? Maintain a list of updated records and timestamp
them when transaction commits may lead to additional I/Os
Timestamp records when they are accessed by other queries, updates, page reads and writes later on.
Where to get the timestamping information?
Volatile timestamp table (VTT) and Persistent timestamp table (PTT)
Transaction 23 begins
Insert a record A
Transaction
commits
Insert a record B
DiskMain
memory
TID Ttime Refcnt
… … …
… … …23 NA 0
VTT
Record ATimestamp=
TID.23
Record BTimestamp=
TID.23
23 NA 123 NA 223 178432
2
TID Ttime
… …
… …
PTT
23 178432
1. Ensure that we can recoverthe timestamping informationif system crashes (VTT is gone!)
Timestamping the RecordTransacti
on 45 begins
Insert a record C
Transaction
commits
Update record A
Disk
Main memory
TID Ttime Refcnt
… … …
… … …23 NA 0
VTT
Record ATimestamp=
TID.23
Record BTimestamp=
TID.23
23 NA 123 NA 223 178432
2
TID Ttime
… …
… …
PTT
23 178432
45 NA 0
Record CTimestamp=
TID.45
Record ATimestamp=
178432
Record DTimestamp=
TID.88
88 342234Update
record D
Record DTimestamp=
342234
45 923121
45 NA 1
23 178432
1
45 923121
1
The Checkpointing Process
Time
kth checkpoint
k-1th checkpoint
k-2th checkpoint
EOLLSN(U)LSN(P)
All the log records have been removed from the log and it is impossible to recover information earlier than LSN(P).
The dirty pages between LSN(P) and LSN(U) have been all flushed into the disk prior to our current checkpoint
The current checkpoint may not finish yet and log records with LSNs between LSN(U) and EOL are not guaranteed to be stable yet.
LSN(P) LSN(U) EOL
k+1th checkpoint
Garbage Collection
LSN(P) LSN(U) EOL
Time
k-2 k-1checkpoint checkpoint checkpoint
k
PTT: Do Nothing VTT: Keep
PTT: Do Nothing VTT: Keep
PTT: Do Nothing VTT: Keep
PTT: Delete Entry VTT: Drop Entry
RefCnt != 0
RefCnt == 0
At the end of current checkpoint (EOL) interval, update: LSN(U) = EOL and LSN(P) = LSN(U)
rcz_lsn rcz_lsn
rcz_lsn rcz_lsnrcz_lsn
PTT: Delete Entry VTT: Drop Entry
PTT: Do Nothing VTT: Keep
rcz_lsn
Let’s Be Even More Lazier Don’t write an entry to PTT when transaction
commits Piggyback timestamping information to the
commit log record so that we still can recover if necessary
Batch updates entries from VTT to PTT at the checkpoint
Why this is better? Batch update using one transaction is faster
than write to PTT on a per transaction basis; A lot of entries have their Refcnt down to zero
by the time of checkpointing less number of writes to PTT
The New StoryTransacti
on 23 begins
Insert a record A
Transaction
commits
DiskMain
memory
TID Ttime Refcnt
… … …
… … …23 NA 0
VTT
Record ATimestamp=
TID.23
23 NA 123223 178432
123 178432
0
TID Ttime
… …
… …
PTT
Transaction 76 begins
Insert a record B
Transaction
commits
Record BTimestamp=
TID.76
76 NA 076 NA 176 287544
1
Update A
76 287544
Record ATimestamp=
178432
Checkpoint
Be Careful When Updating the VTT and PTT at the Checkpoint
LSN(P) LSN(U) EOL
Time
k-2 k-1checkpoint checkpoint checkpoint
k
PTT: Do nothingVTT: Keep
PTT: Insert EntryVTT: Keep
RefCnt != 0
At the end of current checkpoint (EOL) interval, update:LSN(U) = EOL and LSN(P) = LSN(U)
cmt_lsn
cmt_lsn
PTT: Do nothingVTT: Keep
cmt_lsn
Be Careful When Updating the VTT and PTT at the Checkpoint
LSN(P) LSN(U) EOL
Time
k-2 k-1checkpoint checkpoint checkpoint
k
PTT: Do Nothing VTT: Keep
PTT: Do Nothing VTT: Drop Entry
RefCnt == 0
At the end of current checkpoint (EOL) interval, update:LSN(U) = EOL and LSN(P) = LSN(U)
rcz_lsn
rcz_lsncmt_lsn
cmt_lsn
cmt_lsn rcz_lsn
Insert Entry into PTT
cmt_lsn rcz_lsnDelete Entry from PTT and drop it from VTT
cmt_lsn rcz_lsn
PTT: Do Nothing VTT: Keep
Improvement Each record is 200 bytes The database is initialized with 5,000 records Generate workload containing up to 10,000
transactions Each transaction is an insert or an update (to
a newly inserted record by another transaction)
One checkpoint every 500 transactions Cost metrics:
Execution time Number of writes to PTT Number of batched updates
Execution Time
Audit Mode: Always keep everything in PTT
Number of Writes to PTT
Batched Update Analysis
Talk Outline Even “lazier” timestamping Deferred-key-split policy in the TSB tree Auditing the database
Time Split B (TSB) Tree Indexing both the current version pages and
historical version pages simultaneously Time split:
Create a new page and historical records in the current page is pushed into the new page
Key split: Proceed as the normal B+ tree key split
When to do time split and key split?
What Happens Now
Page Header
0Dynamic Slot Array
1
A.1
A.0B.2
B.1
B.0
Record C
Insert C but page is full
Current page
2
What if the current page exceeds the key split threshold?
Record C
Page Header
Dynamic Slot Array
01
Current page
Page Header
0Dynamic Slot Array
1
A.1
A.0B.2
B.1
B.0
Historical page
Why We need a Key Split Threshold? Wait till the page is full then do the key split:
Leads to too many time splits and hence lots of replicas in the historical versions
What is the best value for the key split threshold? Too high: overall utilization drops Too low: current version utilization is reduced Find a balance
Could We Do Better? Key split immediately follows the time split
Leads to two pages with utilization 0.5threshksplit
If the new pages are not filled up quickly, storage utilization is wasted for no good reason
A fix Deferring the key split until the next time that
the page requires a key split Simulate as if a key spit has been performed on
previous occasion as it is in the current situation
Deferring the Key Split
Page Header
0Dynamic Slot Array
1
A.1
A.0B.2
B.1
B.0
Record C
Insert C but page is full
Historical page Current page
2
What if the current page exceeds the key split threshold?
Record C
Page Header
Dynamic Slot Array
01Current page
We still insert the record
A.0’
B.1’
D
2
Page is full again.Update D
Now we key split if last time the page has already satisfied the key split requirement.
D.1
D.0
2
Page Header
0Dynamic Slot Array
1
A.1
A.0B.2
B.1
B.0
We use the key splitvalue from the last occasionwhen a key split should hashappened.
3
Analytical Result We can show the following:
)1( maxmaxmaxno-deferno-deferdefer SVCU
incrup
inSVCUSVCU
Where in is the insertion ratio, up is the update ratio and cr is the compression ratio.
)]2ln
1(2ln
[2ln
SVCU
incrup
inSVCUSVCU
no-deferavg
no-deferavgdefer
avg
The Goal of Our Design To ensure that for any particular version the
version utilization is at least kept above a specified threshold value.
Experiment 50,000 transactions Each transaction inserts or updates a record Varying the insert / update ratio in the
workload Each record is 200 bytes Utilize the delta-compression technique to
compress the historical versions (as they share a lot of common bits with newer version)
Single Version Current Utilization (SVCU)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
SVCU: Deferred-Key-Split vs. No Deferred-Key-Split
uncompressed deferuncompressed no deferAnalytical un-compressedCR=90% deferCR=90% no deferAnalytical CR=90%
Percent of Update
SV
CU
Multi-Version Utilization (MVU)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.3
0.8
1.3
1.8
2.3
2.8
3.3
SVCU: Defer Key Split vs. No Defer Key Split
uncompressed deferuncompressed no deferCR=50% deferCR=50% no deferCR=90% deferCR=90% no defer
Percent of Update
MV
U
Talk Outline Even “lazier” timestamping Deferred-key-split policy in the TSB tree Auditing the database
Auditing A Database Transaction versioning support enables the
check of any prior state of a database Store the user id in PTT for each transaction
entry Any change to the database is traceable User id is grabbed from the current session
that a transaction belongs to
Conclusion Transaction versioning support inside a
database engine is one step closer to be even more practical
Other interesting applications that will become possible now with transaction versioning support?
Thanks!
Recommended