Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
1
Headline Goes Here Speaker Name or Subhead Goes Here
DO NOT USE PUBLICLY PRIOR TO 10/23/12 What’s coming in HBase 0.96
Jonathan Hsieh| @jmhsieh
Software Engineer @ Cloudera / Apache HBase PMC April 17th, 2013
4/17/2013 LA HBase User Group
Who Am I?
• Cloudera:
• Software Engineer since June 2009
• Apache HBase committer / PMC
• Apache Flume founder / PMC
• Apache Sqoop committer / PMC
• U of Washington:
• Research in Distributed Systems
2 LA HBase User Group 4/17/2013
What is Apache HBase?
Apache HBase is an open source, distributed,
scalable, consistent, low latency, random access non-relational database built on Apache Hadoop
ZK HDFS
App MR
3 LA HBase User Group 4/17/2013
Sorted Map Datastore
• It is a big table
• Tables consist of rows, each of which has a primary row key
• Each row has a set of columns
• Rows are stored in sorted order
0000000000
1111111111
2222222222
3333333333
4444444444
5555555555
6666666666
7777777777
LA HBase User Group 4 4/17/2013
Tables and Regions
• Tables are divided into sets of rows called regions
• Scale read and write capacity by spreading across many regions.
0000000000
1111111111
2222222222
3333333333
4444444444
5555555555
6666666666
7777777777
0000000000
1111111111
2222222222
3333333333
4444444444
5555555555
6666666666
7777777777
LA HBase User Group 5 4/17/2013
HBase provides Low-latency Random Access
• Writes: • 1-3ms, 1k-10k writes/sec per node
• Reads: • 0-3ms cached, 10-30ms disk
• 10-40k reads / second / node from cache
• Cell size: • 0-3MB preferred
• Read, write and insert data anywhere in the table
• No sequential write limitations
LA HBase User Group 6
0000000000
1111111111
2222222222
3333333333
4444444444
5555555555
6666666666
7777777777
1
2 3
4
5
4/17/2013
Past, Present, Future
• Historical Overview
• HBase Internals
• HBase Operations
• External Applications
7 4/17/2013 LA HBase User Group
8
Those who don’t know history are destined to repeat it.
Historical Overview
4/17/2013 LA HBase User Group
2007 2009 2008 2006 2010
Open Source, MapReduce & HDFS
First HBase Commit as hadoop contrib)
Apache HBase becomes TLP
Releases CDH3 with HBase 0.90
Origin of Apache HBase
2012 2011
The term “NoSQL” re-enters the lexicon
Uses HBase for Messages
Releases CDH4 with Hbase 0.92
2005 2004
Publishes MapReduce, GFS Paper
LA HBase User Group 9
Publishes BigTable Paper
4/17/2013
Inspiration: Google BigTable (OSDI 2006)
• Goal: Low latency, Consistent, random read/write access to massive amounts of structured data.
• It was the data store for Google’s crawler web table, gmail, analytics, earth, blogger, …
LA HBase User Group 10 4/17/2013
It begins
commit 454a9dbe046194f8eef3dddc3e5942910dd5b7a1
Author: Douglass Cutting <[email protected]>
Date: Tue Apr 3 20:34:28 2007 +0000
HADOOP-1045. Add contrib/hbase, a BigTable-like online database.
11 4/17/2013 LA HBase User Group
Key accomplishments: HBase 0.90 and 0.92
• 0.90 (1/2010) • Proper sync
• Import / Export / Backup tools
• Bulk load introduced
• 0.92 (1/2012) • Coprocessors (similar to triggers)
• ACID Row Transactions
• Strong Authentication and ACLs
• HFile V2
12 4/17/2013 LA HBase User Group
Where are we today (4/17/2013)
• HBase 0.94.0 4/2012 : Stable Release
• Latest is now 0.94.6, 4/4/2013
• Was a Performance release starting off of 0.92
• MR2 support experimental, HDFS 2.0 supported
• HBase 0.95.0 (4/7/2013): Development Release
• Quick releases which will become HBase 0.96
• Full Hadoop 2 (HDFS+MR2) support
13 4/17/2013 LA HBase User Group
14
Operations
4/17/2013 LA HBase User Group
HBase On a Cluster
Name node
Name node
Rack 1
Rack 2
LA HBase User Group
HDFS NameNodes HBase Masters
ZooKeeper Quorum
Slave Boxes (DN + RS)
15 4/17/2013
Commodity Servers (circa 2012)
• 12x 1TB hard disks in a JBOD (Just a Bunch Of Disks) configuration
• 2 quad core CPUs, 2+GHz
• 24-96GBs of RAM (96GBs if you’re considering HBase w/ MR)
• 2x 1Gigabit Ethernet
• $5k-10k / machine
LA HBase User Group 16 4/17/2013
Causes of HBase Downtime within the cluster
• Unplanned Maintenance
• Hardware failures
• Software errors
• Human error
• Planned Maintenance
• Upgrades
• Migrations
Goal: Reduce downtime from hours to minutes to seconds.
17 LA HBase User Group 4/17/2013
Causes of HBase Downtime within the cluster
• Unplanned Maintenance
• Hardware failures
• Software errors
• Human error
• Planned Maintenance
• Upgrades
• Migrations
Goal: Reduce downtime from hours to minutes to seconds.
18 LA HBase User Group 4/17/2013
Unplanned Downtime
• Two sources of unavailability
• Detection time
• Recovery time
Detection Time Recovery Time
time
Failure Event Service realizes there is a
problem starts fixing.
Service is restored.
Service still thinks we are ok
19 LA HBase User Group 4/17/2013
Strategy: HBase-Supported Batch Backups
• Export / Dist CP / Import • 3 batch MR jobs • Several extra copies of
data • High latency (hours)
• Copy Table
• 1 MR Job • Single copy of data • High Latency (hours) • Incremental table copies
Export MR Job
Import MR Job
Dist CP MR Job
Copy Table MR Job
20 LA HBase User Group 4/17/2013
Strategy: Custom Application-managed Replication
• Application writes to two instances of HBase
• Low Latency
• Adds complexity
• Inefficient
LA HBase User Group 21
App App
4/17/2013
Strategy: HBase replication (0.92+)
• HBase Asynchronously copy edit logs to other clusters.
• Replication lag measured in seconds
• Automatically catch up from failures.
• Eventually consistent
• Efficient batching
• Master-slave† (0.90)
• Master-master (0.92)
Replication
22 LA HBase User Group
logs logs
logs logs
4/17/2013
Recovering from User Mistakes: Table Snapshots
• Snapshot the state of a table at a certain moment in time
• Restore it or Clone it later, creating a new read write table
• Export it to another cluster with minimal impact on HBase
23 LA HBase User Group
time
User Error: drop ‘table’
Service is restored, minor data loss
Periodic snapshot
Service is down!
restore
Periodic snapshot
4/17/2013
Table Snapshots
• Offline snapshots
• Online Snapshots
• Snapshot uses
• Recover from application or user error.
• Application experimentation (no need to spin up another cluster for replication)
• Use MR directly on snapshot files
24 LA HBase User Group 4/17/2013
Backup Strategies 0.90 0.92 0.94 0.95 (dev)
•Copy Table / Import / Export
•Copy Table / Import / Export
•Copy Table / Import / Export
•Copy Table / Import / Export
•Master-Slave Replication† •Master-Master Replication •Master-Master Replication
•Master-Master Replication
•Snapshots* •Snapshot Export*
•Snapshots •Snapshot Export
High production impact High production impact Less production impact Less production impact
† experimental (in progress) *backported in CDH
25 LA HBase User Group 4/17/2013
26
Internals
4/17/2013 LA HBase User Group
Reduce downtime by speeding up recovery time
• Distributed log splitting (0.92)
• Automated metadata repairs with hbck (0.92)
• Enable of writes while recovering from failure (0.96)
Detection Time
time
Failure Event Service realizes there is a
problem starts fixing.
Service is restored.
Service still thinks we are ok
27 LA HBase User Group 4/17/2013
Reduce downtime by speeding up detection
• Proactively notify to recover from process failures quickly
time
Failure Event
Service still thinks we are ok
Service realizes there is a problem starts fixing.
Service is restored.
28 LA HBase User Group
0.92/0.94 All Master Failure Detection 180s Some Region Server Failure detection 180s
0.96 Master process failure detection 0-1s
Region Server Failure detection 0-1s
4/17/2013
Recovery Improvement Features 0.90 0.92 0.94 0.95 (dev)
•Metrics •Metrics •CF+Region Granularity Metrics
•CF +Region Granularity Metrics •Improved failure detection time
•Distributed log splitting* •HBCK improvements*
•Distributed log splitting •HBCK improvements
•Distributed log splitting •HBCK improvements
•Distributed log splitting •HBCK improvements
Recovery in Hours Recovery in Minutes Recovery in Minutes Recovery in Seconds
† experimental (in progress) *backported in CDH
29 LA HBase User Group 4/17/2013
Prevent user mistakes: User-level Security
• Authentication:
• Ensure the identity of the services or users that are communicating
• Access Control:
• Ensure user has permission to execute table data operations
30 LA HBase User Group
time
User Error: drop ‘table’
Operation rejected, insufficient permissions.
4/17/2013
Eliminate HBase downtime
• Highly Available stack: HDFS (2.0) / ZK / HBASE
• Client Cross-version wire compatibility (0.96)
• Rolling Restarts
• Online Schema Change (expriemental in 0.94, solid in 0.95)
time
Maintenance Begins
Service remains online, slightly degraded
Full service is restored.
Maintenance Time
31 LA HBase User Group 4/17/2013
High Availability: HBase + HDFS + ZK
Name node
Name node
Rack 1
Rack 2
32 LA HBase User Group
HDFS NameNodes HBase Masters
ZooKeeper Quorum
Slave Boxes (DN + RS)
4/17/2013
Wire Compatibility
• Reduces downtime due to planned maintenance • Wire compatibility + Extensible Data formats
• Allow for forwards and backwards compatibility • Older clients can talk to newer servers and visa
versa
• Rolling upgrade • Upgrade a single node at a time while system runs
• Allows API and changes while guaranteeing wire compatibility between different minor versions
• HDFS client-server compatibility between Major Versions
ZK HDFS
App MR
33 LA HBase User Group 4/17/2013
Eliminate Admin Race conditions
• There are racy admin operations in 0.94
• Table locks (0.95)
• Enables clean online schema changes
• No long require table downtime due to disable!
• Greatly reduce the possibility of metadata corruption
• Enables safe snapshots and splits
• Prevent conflicts between merge and split
• Online merge (0.95)
34 4/17/2013 LA HBase User Group
Downtime Avoidance Features 0.90 0.92 0.94 0.95 (dev)
•Upgrade Script •Upgrade by Shutdown-Restart
•HDFS 2.0 Support (HA) •Rolling Upgrade
•HDFS 2.0 Support (HA) •Shutdown / Restart (Singularity) •Client Wire compatibility
•Authentication and Authorization
•Authentication and Authorization
•Authentication and Authorization •Secure Bulk Load
•Online Schema Change†
•Table Locks •Online Schema Change
•Offline Merge Script •Offline Merge Script •Offline Merge Script •Offline Merge Script •Online Merge†
Shutdowns And Migration Shutdown and Restart Rolling Upgrade One final Shutdown Restart
† experimental (in progress) *backported in CDH
35 LA HBase User Group 4/17/2013
Predictable performance
• New Goal: Improving the latency of the 99%tile. • Previously:
• JVM GC had large impact • MSLAB feature reduces memory fragmentation impact (0.90.x)
• Today: • Major Compaction introduces latency volitility • Striped compaction
• Reduce chances of compactions with large files in uniform write workloads
• Exploring compaction • Improved heuristic to avoid impact of bulk loaded data
36 4/17/2013 LA HBase User Group
Future Features
• System Tables
• Cell level Security
• QoS
• Allow MR on low latency workload colocated with analytic MR jobs
• Multi Row Transactions
• Client Rewrite
37 4/17/2013 LA HBase User Group
38
Those who don’t know history are destined to repeat it.
Ecosystem
4/17/2013 LA HBase User Group
Production Apache HBase Applications
• Inbox
• Storage
• Web
• Search
• Analytics
• Monitoring
More Case Studies at http://www.hbasecon.com/agenda/
LA HBase User Group 39 4/17/2013
HBase Application Architecture
HDFS
HBase Client
Thrift/REST Gateway
Web Application
LA HBase User Group
HBase Client
Thrift/REST Gateway …
Analysis: MapReduce
HBase Client
Analysis: MapReduce
HBase Client
Analysis: MapReduce
HBase Client
Analysis: MapReduce
HBase Client
Analysis: MapReduce
HBase Client
Analysis: MapReduce
HBase Client
Analysis: MapReduce
HBase Client
Bulk Import: MapReduce
HBase Client
Cluster boundary
40 4/17/2013
Notable Services on HBase
• Streaming analysis on HBase
• Continuuity
• SQL built on HBase
• Splice Machine
• Drawn To Scale Spire
• User data management
• WibiData
41 4/17/2013 LA HBase User Group
Schema and SQL on HBase
• Hive
• Phoenix
• KijiSchema
• Impala
42 4/17/2013 LA HBase User Group
43
Conclusions
4/17/2013 LA HBase User Group
HBase Evolution
The Past
• Recovery in hours • Detection in minutes
• High Impact Backups
• Shutdown to upgrade
• Highly Variable performance
• Focus on new features
The Present (and future)
• Recovery in second • Detection in seconds
• Lower impact Backups
• Rolling Upgrade
• (Predictable 99th %tile)
• Focus on enabling ecosystem
44 4/17/2013 LA HBase User Group
45 4/17/2013 LA HBase User Group
Install HBase on your laptop
• Download and untar
wget http://apache.osuosl.org/hbase/hbase-0.94.5/hbase-0.94.5.tar.gz
tar xvfz hbase-0.94.5.tar.gz
cd hbase-0.94.5
bin/start-hbase.sh
• Verify:
bin/hbase shell
Browse http://localhost:60010
LA HBase User Group 46 4/17/2013