View
6
Download
0
Category
Preview:
Citation preview
Fast Scans on Key-Value Stores
James Lennon
1. What is the problem?
What is the problem?
What is the problem?
OR
What is the problem?
ORCan we get both?
2. Why is this important?
Increased adoption of NoSQL KVS
Increased adoption of NoSQL KVS
While also using SQL column stores
+
Why this isn’t great
● Time consuming● Requires expertise● Harder to maintain
3. Why is this hard?
Analytical Queries
High localityColumn orientedGreedy updatesCompact representation
Transactional Queries
Differential data structuresRow orientedLazy updatesSparse Indexes
Difficulties when adding scans to KVS
● Versioning● Batching● Design space
4. Why are previous solutions insufficient?
5. Core intuition for the solution
Overview
1. Amend the SQL-over-NoSQL architecture2. Discuss the design space3. Implement TellStore-Log (transaction optimized)4. Implement TellStore-Col (analytics optimized)
SQL-over-NoSQL
SQL-over-NoSQL
● KVS must support scans (selections, simple aggregates, projections)● Multiversion concurrency control● Batching● Asynchronous IO
Updates
Update-in-placeLog-structuredDelta-main
Design space
Updates
Update-in-placeLog-structuredDelta-main
Records
Row-majorColumn-major (PAX)
X
Design space
Updates
Update-in-placeLog-structuredDelta-main
Records
Row-majorColumn-major (PAX)
Versioning
Clustered versionsChained versions
X X
Design space
Updates
Update-in-placeLog-structuredDelta-main
Records
Row-majorColumn-major (PAX)
Versioning
Clustered versionsChained versions
Garbage CollectionPeriodicPiggy-backed
X X X
Design space
TellStore-Log
Log-structuredRow-majorChained VersionsPiggy-backed GC
TellStore-Col
Delta-mainColumn-majorClustered VersionsPeriodic GC
Extremes
Design space
TellStore-Log
TellStore-Log
Append only; lock-free
TellStore-Log
Linear probing
Append only; lock-free
TellStore-Log
Linear probing
Append only; lock-free
Self-contained
TellStore-Log: Insert Example
Hash Table
...
Log
TellStore-Log: Insert Example
Hash Table
...
Log
TellStore-Log: Insert Example
Hash Table
...
Log
foo null 102 null bar
TellStore-Log: Insert Example
foo
Hash Table
...
Log
foo null 102 null bar
TellStore-Log: Update Example
foo
Hash Table
...
Log
foo null 102 null bar
TellStore-Log: Update Example
foo
Hash Table
...
Log
foo null 102 104 bar
foo ptr 104 null bar2
TellStore-Log: Update Example
foo
Hash Table
...
Log
foo null 102 104 bar
foo ptr 104 null bar2
TellStore-Log: Garbage Collection
1. Get “health” of pagesa. Mark if below threshold
2. Move and update to head of log
TellStore-Log: Summary
● Log structure -> Fast put operations● Hash Table -> Fast get operations● Snapshot Isolation -> Higher throughput, concurrent queries without locks● Self-contained entries -> Improve scan performance● Lazy GC -> Improve scan performance, keep memory in check
TellStore-Col
TellStore-Col
Read-only;Read optimized
TellStore-Col
Read-only;Read optimized
Append-only;Write optimized
Append-only;Write optimized
TellStore-Col: Main Storage (PAX)
TellStore-Col: Main Storage (PAX)
Fixed-size data is column-major
TellStore-Col: Main Storage (PAX)
Fixed-size data is column-major
Var-size data is indexed column-wise
but stored in row-major format
TellStore-Col: Versioning
● Main & Insert log have newest pointer● Update log has previous pointer● In main, different versions of same key stored contiguously, newest to oldest
TellStore-Col: Insert Example
abc
...
Insert Log
...
Update Log
Hash Table abc xyz null
Main Storage
TellStore-Col: Insert Example
abc
...
Insert Log
...
Update Log
Hash Table abc xyz null
Main Storage
TellStore-Col: Insert Example
abc
...
Insert Log
...
Update Log
Hash Table abc xyz null
Main Storage
foo 102 null null bar
TellStore-Col: Insert Example
abc
foo
...
Insert Log
...
Update Log
Hash Table abc xyz null
Main Storage
foo 102 null null bar
TellStore-Col: Update Example
abc
foo
...
Insert Log
...
Update Log
Hash Table abc xyz null
Main Storage
foo 102 null null bar
TellStore-Col: Update Example
abc
foo
...
Insert Log
...
Update Log
Hash Table abc xyz null
Main Storage
foo 102 104 null bar
foo 104 null ptr bar2
TellStore-Col: Update Example
abc
foo
...
Insert Log
...
Update Log
Hash Table abc xyz null
Main Storage
foo 102 104 ptr bar
foo 104 null ptr bar2
TellStore-Col: Garbage Collection
● Dedicated thread● Scans over main
○ Aggressive; rewrites page if single entry out of date○ Adds valid previous entries from update log
● Scans over insert-log● Truncates logs
TellStore-Col: Summary
● Delta-main -> Compromise between scans and updates● PAX structured main -> Minimize disk IO while still having data locality● Separate insert and update logs -> More efficient garbage collection● Aggressive GC -> Improve scan performance, keep memory in check
TellStore: Implementation Details
TellStore: Implementation Details
6. Does the paper prove its claims?
Clear Justifications and Considerations
● Clearly explain design options and tradeoffs involved● Consistently choose good compromises between transactions & scans● Provide two designs, “extreme” hybrids to optimize for either
7. Setup of experiments? Are they sufficient?
Comparing Transactional Throughput
Comparing Transactional Throughput
Investigating Batch Size
Comparing Query Response Time
Comparing Query Response Time
Comparing Query Response Time
8/9. Gaps in the logic / proof? Possible next steps?
Next Steps
● How do their designs perform with disk storage?● Data replication (high availability)● Distributed queries● Less aggressive GC variants
Recommended