YCSB++ Benchmarking Tool Performance Debugging Advanced
Features of Scalable Table Stores
Swapnil Patil Milo Polte, Wittawat Tantisiriroj, Kai Ren, Lin Xiao, Julio Lopez, Garth Gibson, Adam Fuchs *, Billie Rinaldi *
Carnegie Mellon University * National Security Agency
Open Cirrus Summit, October 2011, Atlanta GA
Scalable table stores are critical systems
Swapnil Patil, CMU 2
• For data processing & analysis (e.g. Pregel, Hive) • For systems services (e.g., Google Colossus metadata)
Evolution of scalable table stores
Simple, lightweight complex, feature-rich stores Supports a broader range of applications and services Hard to debug and understand performance problems Complex behavior and interaction of various components
Growing set of HBase features
2008 2009 2010 2011+
RangeRowFilters Batch updates
Bulk load tools RegEx !ltering
Scan optimizations
HBASE release
Co-processors Access Control
⏏
⏏
⏏
3 Swapnil Patil, CMU
YCSB++ FUNCTIONALITY
ZooKeeper-based distributed and coordinated testing API and extensions the new Apache ACCUMULO DB Fine-grained, correlated monitoring using OTUS
FEATURES TESTED USING YCSB++ Batch writing Table pre-splitting Bulk loading
Weak consistency Server-side !ltering Fine-grained security
Tool released at http://www.pdl.cmu.edu/ycsb++ Swapnil Patil, CMU 4
Need richer tools for understanding advanced features in table stores …
Outline • Problem • YCSB++ design • Illustrative examples • Ongoing work and summary
Swapnil Patil, CMU 5
Yahoo Cloud Serving Benchmark [Cooper2010]
Swapnil Patil, CMU 6
• For CRUD (create-read-update-delete) benchmarking • Single-node system with an extensible API
Storage Servers
HBASE
OTHER DBS
Workload Executor
Threads
Stats
DB Cl
ients
Command-line Parameters
Workload Parameter
File
YCSB++: New extensions
Swapnil Patil, CMU 7
Added support for the new Apache ACCUMULO DB − New parameters and workload executors
Storage Servers
HBASE
OTHER DBS
Workload Executor
Threads
Stats
DB Cl
ients Workload
Parameter File
Command-line Parameters
EXTENSIONS EXTENSIONS ACCUMULO
YCSB++: Distributed & parallel tests
Swapnil Patil, CMU 8
Multi-client, multi-phase coordination using ZooKeeper − Enables testing at large scales and testing asymmetric features
Storage Servers
HBASE
OTHER DBS
Workload Executor
Threads
Stats
DB Cl
ients Workload
Parameter File
Command-line Parameters
EXTENSIONS EXTENSIONS
MULTI-PHASE
ACCUMULO
YCSB clients COORDINATION
YCSB++: Collective monitoring
Swapnil Patil, CMU 9
OTUS monitor built on Ganglia [Ren2011]
− Collects information from YCSB, table stores, HDFS and OS
Storage Servers
HBASE
OTHER DBS
Workload Executor
Threads
Stats
DB Cl
ients Workload
Parameter File
Command-line Parameters
EXTENSIONS EXTENSIONS
MULTI-PHASE
ACCUMULO
YCSB clients COORDINATION OTUS MONITORING
Example of YCSB++ debugging
Swapnil Patil, CMU 10
OTUS collects !ne-grained information
− Both HDFS process and TabletServer process on same node
0
20
40
60
80
100
00:00 04:00 08:00 12:00 16:00 20:00 00:00 04:00
0
8
16
24
32
40
CP
U U
sag
e (
%)
Avg
Nu
mb
er
of
Sto
reF
iles
Pe
r T
ab
let
Time (Minutes)
Monitoring Resource Usage and TableStore Metrics
Accumulo Avg. StoreFiles per TabletHDFS DataNode CPU Usage
Accumulo TabletServer CPU Usage
Outline • Problem • YCSB++ design • Illustrative examples − YCSB++ on HBASE and ACCUMULO (Bigtable-like stores)
• Ongoing work and summary
Swapnil Patil, CMU 11
Tablet Servers
Recap of Bigtable-like table stores
Swapnil Patil, CMU 12
HDFS nodes
Tablet TN
Memtable
(Fewer) Sorted
Indexed Files
Sorted Indexed
Files
MINOR COMPACTION
MAJOR COMPACTION
Write Ahead
Log
Data Insertion 1 2
3
Write-path: in-memory buffering & async FS writes 1) Mutations logged in memory tables (unsorted order) 2) Minor compaction: Memtables -> sorted, indexed !les in HDFS 3) Major compaction: LSM-tree based !le merging in background
Read-path: lookup both memtables and on-disk !les
Apache ACCUMULO Started at NSA; now an Apache Incubator project − Designed for for high-speed ingest and scan workloads − http://incubator.apache.org/projects/accumulo.html
New features in ACCUMULO − Iterator framework for user-speci!ed programs placed in
between different stages of the DB pipeline E.g., Support joins and stream processing using iterators
− Also supports !ne-grained cell-level access control
Swapnil Patil, CMU 13
ILLUSTRATIVE EXAMPLE #1
Analyzing the fast inserts vs. weak consistency
tradeoff using YCSB++
Swapnil Patil, CMU 14
Client-side batch writing
Feature: clients batch inserts, delay writes to server • Improves insert throughput and latency • Newly inserted data may not be immediately visible to
other clients Swapnil Patil, CMU 15
⏏
⏏Table store servers
ZooKeeper Cluster
Manager YCSB++
Store client
Batch
YCSB++
Store client
CLIENT #1 CLIENT #2
Read {K}
Batch writing improves throughput
6 clients creating 9 million 1-Kbyte records on 6 servers − Small batches - high client CPU utilization, limits throughput − Large batches - saturate servers, limited bene!t from batching
Swapnil Patil, CMU 16
0 10 20 30 40 50 60
10 KB 100 KB 1 MB 10 MB
Inse
rts pe
r sec
ond (
1000
s)
Batch size
Hbase Accumulo
Table store servers
ZooKeeper
Batch writing causes weak consistency
Swapnil Patil, CMU 17
Test setup: ZooKeeper-based client coordination • Share producer-consumer queue between readers/writers • R-W lag = delay before C2 can read C1’s most recent write
YCSB++
Store client
Batch
YCSB++
Store client
1 2 3 4
CLIENT #1 CLIENT #2
Insert {K:V}
(106 records)
Enqueue K (sample 1% records)
Poll and dequeue K
Read {K}
Batch writing causes weak consistency
Deferred write wins, but lag can be ~100 seconds − (N%) = fraction of requests that needed multiple read()s − Implementation of batching affects the median latency
Swapnil Patil, CMU 18
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 10 100 1000 10000 100000
Fra
ctio
n o
f re
qu
est
s
read-after-write time lag (ms)
(a) HBase: Time lag for different buffer sizes
10 KB ( <1%)100 KB (7.4%)
1 MB ( 17%)10 MB ( 23%)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 10 100 1000 10000 100000
Fra
ctio
n o
f re
quest
s
read-after-write time lag (ms)
(b) Accumulo: Time lag for different buffer sizes
10 KB ( <1%)100 KB (1.2%)
1 MB ( 14%)10 MB ( 33%)
ILLUSTRATIVE EXAMPLE #2
Benchmarking high-speed ingest features
using YCSB++
Swapnil Patil, CMU 19
Features for high-speed insertions Most table stores have high-speed ingest features − Periodically insert large amounts of data or migrate old
data in bulk − Classic relational DB techniques applied to new stores
Two features: bulk loading and table pre-splitting • Less data migration during inserts • Engages more tablet servers immediately • Need careful tuning and con!guration [Sasha2002]
Swapnil Patil, CMU 20
⏏
⏏
⏏
8-phase test setup: table bulk loading Bulk loading involves two steps
− Hadoop-based data formatting − Importing store !les into table store
Pre-load phase (1 and 2) − Bulk load 6M rows in an empty table − Goal: parallelism by engaging all servers
Load phase (4 and 5) − Load 48M new rows − Goal: study rebalancing during ingest
R/U measurements (3, 6 and 7) − Correlate latency with rebalancing work
Swapnil Patil, CMU 21
Load (importing)
Read/Update workload
Load (re-formatting)
Read/Update workload
Sleep
Read/Update workload
Pre-Load (importing)
Pre-Load (re-formatting)
Phases
1
2
3
4
5
6
7
8
Read latency affected by rebalancing work
Swapnil Patil, CMU 22
Load (importing)
Read/Update workload
Load (re-formatting)
Read/Update workload
Sleep
Read/Update workload
Pre-Load (importing)
Pre-Load (re-formatting)
Phases
1
2
3
4
5
6
7
8
1
10
100
1000
0 60 120 180 240 300
Accu
mul
o Rea
d Lat
ency
(ms)
Measurement Phase Running Time (Seconds)
R/U 1 (Phase 3) R/U 2 (Phase 6) R/U 3 (Phase 8)
• High latency after high insertion periods that cause servers to rebalance (compactions)
• Latency drops after store is in a steady state
Rebalancing on ACCUMULO servers
Swapnil Patil, CMU 23
Load (importing)
Read/Update workload
Load (re-formatting)
Read/Update workload
Sleep
Read/Update workload
Pre-Load (importing)
Pre-Load (re-formatting)
Phases
1
2
3
4
5
6
7
8 • OTUS monitor shows the server-side
compactions during post-ingest measurement phases
1
10
100
1000
0 300 600 900 1200 1500 1800 Experiment Running Time (sec)
StoreFiles
Tablets
Compactions
HBASE is slower: Different compaction policies
Swapnil Patil, CMU 24
1
10
100
1000
10000
0 60 120 180 240 300
Accu
mul
o Rea
d Lat
ency
(ms)
Measurement Phase Running Time (Seconds)
R/U 1 (Phase 3) R/U 2 (Phase 6) R/U 3 (Phase 8)
1
10
100
1000
0 300 600 900 1200 1500 1800 Accumulo Experiment Running Time (sec)
StoreFiles
Tablets
Compactions
1
10
100
1000
10000
0 60 120 180 240 300
HBas
e Rea
d Lat
ency
(ms)
Measurement Phase Running Time (Seconds)
1
10
100
1000
0 300 600 900 1200 1500 1800 HBase Experiment Running Time (sec)
Extending to table pre-splitting
Swapnil Patil, CMU 25
Table
pre-s
plitti
ng te
st Load
Pre-load
Pre-split into N ranges
Read/Update workload
Sleep
Read/Update workload
Load (importing)
Read/Update workload
Load (re-formatting)
Read/Update workload
Sleep
Read/Update workload
Pre-Load (importing)
Pre-Load (re-formatting)
Bulk loading test
Pre-split a key range into N partitions to avoid splitting during insertion
Outline • Problem • YCSB++ design • Illustrative examples • Ongoing work and summary
Swapnil Patil, CMU 26
Things not covered in this talk More features: function shipping to servers − Data !ltering at the servers − Fine-grained, cell-level access control
More details in the ACM SOCC 2011 paper
Ongoing work − Analyze more table stores: Cassandra, CouchDB, MongoDB − Continue research through the new Intel Science and
Technology Center for Cloud Computing at CMU (with GaTech)
Swapnil Patil, CMU 27
Summary: YCSB++ tool • Tool for performance debugging and benchmarking
advanced features using new extensions to YCSB
• Two case-studies: Apache HBASE and ACCUMULO • Tool available at http://www.pdl.cmu.edu/ycsb++
Weak consistency semantics Distributed clients using ZooKeeper
Fast insertions (pre-splits & bulk loads) Multi-phase testing (with Hadoop)
Server-side !ltering New workload generators and database client API extensions Fine-grained access control
28 Swapnil Patil, CMU