View
3
Download
0
Category
Preview:
Citation preview
1 © Copyright 2013 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
An Introduc@on to Apache HBase Ian Wrigley Curriculum Manager, Cloudera ian@cloudera.com @iwrigley
2 © Copyright 2013 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
Agenda
§ What is HBase?
§ HBase usage scenarios § HBase table basics § HBase architecture § HBase schema fundamentals
§ Ques@ons
3 © Copyright 2013 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
§ HBase is a distributed, scalable big data store built on top of Hadoop
§ Some=mes referred to as a ‘NoSQL’ data store – Access pa=erns are restricted to just get, put, scan (par@al or full table scan) – Does not use SQL to access the data
§ Goal: low-‐latency, consistent, random read/write of data
§ Based on Google’s BigTable – For a long @me, the data store for the Google Web Crawler’s data, GMail, Google Analy@cs…
What is HBase?
4 © Copyright 2013 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
RDBMS HBase Data layout Row or column-‐oriented Column Family-‐oriented
Transac=ons Mul=-‐row ACID Single row only
Query language SQL get/put/scan
Security Authen=ca=on/Authoriza=on Column Family-‐level authen=ca=on/authorizatoin
Indexes On arbitrary columns Row-‐key only
Max data size TBs PB+
Read/write throughput limits
1000s queries/second Millions of “queries”/second
HBase is NOT a Tradi@onal RDBMS
5 © Copyright 2013 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
§ Hadoop provides: – Fault tolerance – Scalability – Batch processing with MapReduce
§ HBase provides: – Random reads and writes – High throughput – Caching
§ HBase data is all stored in HDFS
§ Note: HBase does not use MapReduce! – HBase is real-‐@me, MapReduce is not – Although it is possible to run MapReduce jobs on data in HBase tables
HBase is Built on Hadoop
6 © Copyright 2013 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
§ Writes – 1-‐3ms – 1,000 to 10,000 per node per second
§ Reads – 0-‐3ms cached – 10-‐30ms from disk – 10,000-‐40,000 reads/sec/node from cache
§ Read and write data anywhere in the table – No requirement for sequen@al writes
Low-‐Latency Random Data Access
7 © Copyright 2013 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
Agenda
§ What is HBase?
§ HBase usage scenarios § HBase table basics § HBase architecture § HBase schema fundamentals
§ Ques@ons
8 © Copyright 2013 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
§ Lots of data – Hundreds of Gigabytes up to Petabytes
§ High write throughput – 1000s/second per node
– Scales to hundreds of thousands of writes/second across the cluster
§ Scalable cache capacity – Adding nodes adds to available cache
§ Data layout – Excels at key lookup – No penalty for sparse columns
Usage Scenarios for HBase
9 © Copyright 2013 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
§ Use HBase if… – You need random write, random read, or both (but not neither) – You need to do many thousands of opera@ons per second on mul@ple TB of data – Your access pa=erns are well-‐known and simple
§ Don’t use HBase if… – You only append to your dataset, and tend to read the whole thing – You primarily do ad-‐hoc analy@cs (ill-‐defined access pa=erns) – Your data easily fits on one beefy node – You’re only doing it because it’s what the cool kids are using
When To Use HBase
10 © Copyright 2013 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
§ eBay – ‘Cassini’ cluster indexes the en@re eBay site inventory – Approx 15TB of data – Random write: 200,000,000 rows/day – Bulk data import: 500,000,000 rows in 30 minutes – 1.2TB of data imported each day
§ Facebook – Uses HBase for its messaging store
– Stores small messages and message indexes in HBase – 75B+ R/W opera@ons/day – At peak, 1.5M opera@ons/second – 2PB+ of data in HBase
A Couple of Large HBase Users…
11 © Copyright 2013 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
Agenda
§ What is HBase?
§ HBase usage scenarios § HBase table basics § HBase architecture § HBase schema fundamentals
§ Ques@ons
12 © Copyright 2013 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
§ Tables are comprised of rows and columns
§ Every row has a row key (analogous to a primary key in a tradi=onal RDBMS) – Rows are stored sorted by row key for fast lookups
§ All columns in HBase belong to a par=cular column family
§ A table has one or more column families – Typically a table will have a small number of column families – Column families should rarely change – A column family can have any number of columns – Columns within a family are sorted and stored together – Columns only exist when inserted
– NULLs are free
§ Table cells are versioned, uninterpreted arrays of bytes
Overview
13 © Copyright 2013 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent. 13 9/24/12 STL HUG / Strangeloop unsessions
Row key
info: height
info:state roles:hadoop roles:hbase
cujng ‘9k’ ‘CA’ ‘Founder’
tlipcon ‘5k7’ ‘CA’ ‘PMC’ @ts=2011 ‘Commi=er’ @ts=2010
‘Commi=er’
Logical View as ‘Records’
14 © Copyright 2013 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent. 14 9/24/12 STL HUG / Strangeloop unsessions
Row key
info: height
info:state roles:hadoop roles:hbase
cujng ‘9k’ ‘CA’ ‘Founder’
tlipcon ‘5k7’ ‘CA’ ‘PMC’ @ts=2011 ‘Commi=er’ @ts=2010
‘Commi=er’
Implicit PRIMARY KEY in RDBMS terms
Logical View as ‘Records’
15 © Copyright 2013 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent. 15 9/24/12 STL HUG / Strangeloop unsessions
Row key
info: height
info:state roles:hadoop roles:hbase
cujng ‘9k’ ‘CA’ ‘Founder’
tlipcon ‘5k7’ ‘CA’ ‘PMC’ @ts=2011 ‘Commi=er’ @ts=2010
‘Commi=er’
Implicit PRIMARY KEY in RDBMS terms
Data is all byte[] in HBase
Logical View as ‘Records’
16 © Copyright 2013 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent. 16 9/24/12 STL HUG / Strangeloop unsessions
Row key
info: height
info:state roles:hadoop roles:hbase
cujng ‘9k’ ‘CA’ ‘Founder’
tlipcon ‘5k7’ ‘CA’ ‘PMC’ @ts=2011 ‘Commi=er’ @ts=2010
‘Commi=er’ A single cell might have different values at different @mestamps
Implicit PRIMARY KEY in RDBMS terms
Data is all byte[] in HBase
Logical View as ‘Records’
17 © Copyright 2013 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent. 17 9/24/12 STL HUG / Strangeloop unsessions
Row key
info: height
info:state roles:hadoop roles:hbase
cujng ‘9k’ ‘CA’ ‘Founder’
tlipcon ‘5k7’ ‘CA’ ‘PMC’ @ts=2011 ‘Commi=er’ @ts=2010
‘Commi=er’ A single cell might have different values at different @mestamps
Different rows may have
different sets of columns (table is
sparse)
Implicit PRIMARY KEY in RDBMS terms
Data is all byte[] in HBase
Logical View as ‘Records’
18 © Copyright 2013 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent. 18 9/24/12 STL HUG / Strangeloop unsessions
Row key
info: height
info:state roles:hadoop roles:hbase
cujng ‘9k’ ‘CA’ ‘Founder’
tlipcon ‘5k7’ ‘CA’ ‘PMC’ @ts=2011 ‘Commi=er’ @ts=2010
‘Commi=er’ A single cell might have different values at different @mestamps
Different rows may have
different sets of columns (table is
sparse)
Implicit PRIMARY KEY in RDBMS terms
Data is all byte[] in HBase Column format family:qualifier
Logical View as ‘Records’
19 © Copyright 2013 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
§ Physically, data is stored on a per-‐Column Family basis as a sorted map – Ordered by row key, column key in ascending order – For the same rowkey and column qualifier, ordered by @mestamp in descending order
Physical Storage
Row key
Column key Timestamp Cell value
Row1 info:aaa 1273516197868 valueA Row1 info:bbb 1273871824184 valueB Row1 info:ccc 1273746289103 valueC Row2 info:hello 1273878447049 i_am_a_value Row3 info: aaa 1273616297446 another_value
Sorted by Row key
and Column
20 © Copyright 2013 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
§ By default, HBase keeps three versions of a row
§ The versions are sorted by their =mestamp (in descending order)
Versions
Key Column Value Timestamp
rowA Fam:foo New value 1275340679713
rowA Fam:foo Old value 1275091706190
rowB Fam:foo Some value 1274999316683
Sorted in descending order
21 © Copyright 2013 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
§ Columns are grouped into Column Families (CFs)
§ All column family members have the same prefix – E.g., info:height and info:state – The “:” delimits the CF from the qualifier
§ Columns can be created on the fly
§ Physically, all column family members are stored together
§ Column families must be declared at schema defini=on =me
§ Tuning and storage selngs can be specified for each Column Family
HBase Columns and Column Families
22 © Copyright 2013 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
Amribute Possible values Default
COMPRESSION NONE, GZ, LZO, SNAPPY NONE
VERSIONS 1+ 3
TTL 1-‐2147483647 (seconds) FOREVER (special value, means the data is never deleted)
BLOCKSIZE 1 byte -‐ 2GB 64K
IN_MEMORY true, false false
BLOCKCACHE true, false true
Column Family A=ributes
23 © Copyright 2013 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
§ Scaling rela=onal tables oqen means par==oning or sharding data – HBase automa@cally par@@ons data in regions – A region is a range of rows – Regions are automa@cally split (broken into two) when they become too large
§ In rela=onal databases, one might normalize tables and use joins to retrieve data – HBase does not support explicit joins – A lookup by row key implicitly joins data from column families if necessary
Comparison with RDBMS Design
24 © Copyright 2013 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
§ Bytes-‐in/bytes-‐out interface
§ Anything that can be converted to an array of bytes can be stored – Input can be strings, numbers, complex objects, images, etc.
§ Cell size – Prac@cal limits to the size of values – In general, cell size should not consistently be above 2-‐3MB – For large cell size:
– Increase the block size – Increase the maximum region size for the table – Keep the index size reasonable
§ Counters – Synchroniza@on is done on the RegionServer (not client)
Supported Data Types
25 © Copyright 2013 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
§ Data opera=ons – Get – Put – Scan – Increment – CheckAndPut – Delete
§ Access via HBase shell, Java API, REST proxy
Access HBase via its API
26 © Copyright 2013 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
byte[] row = Bytes.toBytes("rowkey"); byte[] col = Bytes.toBytes("cf1:colname"); byte[] putVal = Bytes.toBytes("cell value here"); Configuration config = HBaseConfiguration.create(); HTable table = new HTable(config, "myTable"); Put p = new Put(row); p.add(col, putVal) table.put(p); Get g = new Get(row); Result r = table.get(g); byte[] getVal = r.getValue(col); assertEquals(putVal, getVal);
Access HBase via its API (cont’d)
27 © Copyright 2013 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
Agenda
§ What is HBase?
§ HBase usage scenarios § HBase table basics § HBase architecture § HBase schema fundamentals
§ Ques@ons
28 © Copyright 2013 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
§ ZooKeeper – A centralized service used to maintain configura@on informa@on for HBase
§ Catalog Tables – Keep track of the loca@ons of region servers and regions
§ Master – Monitors all region server instances in the cluster – The interface for all metadata changes
§ RegionServer – Responsible for serving and managing regions
§ Region – A set of rows belonging to a table
Major Components of an HBase Cluster
29 © Copyright 2013 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
§ ZooKeeper service – Stores global informa@on about the cluster – Provides synchroniza@on and detects master node failure – Holds the loca@on of the -‐ROOT-‐ table and the master
ZooKeeper
Client
Zookeeper
Master
Lookup Master and -ROOT-
Read/Write Data
Client Rarely Needs Master
Register Master and -ROOT- Locations
Assigns Regions to RegionServers and Check the Health of RegionServers
Region 3
RegionServer 2Table 'Foo'
HLog
Table 'Foo'RegionServer 2
ColumnFamily
Region 2
Key500
Key999
Region 3
RegionServer 2Table 'Foo'
HLog
Table 'Foo'RegionServer 1
ColumnFamily
Region 1
Key001
Key499
30 © Copyright 2013 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
§ Responsible for coordina=ng the region servers
§ Assigns regions, detects region server failures
§ Handles schema changes
§ Master runs several background threads – LoadBalancer periodically reassigns regions in the cluster – CatalogJanitor periodically checks and cleans up the .META. Table
§ An HBase cluster can have mul=ple masters – Upon startup all compete to run the cluster – If the ac@ve master loses its lease in Zookeeper, the remaining masters compete for the master role
Master
31 © Copyright 2013 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
§ Daemons which runs on some (typically all) of the slave nodes in the cluster
§ Serve data for reads and writes of rows contained in regions
§ Regions which become too large will automa=cally be split
RegionServers
32 © Copyright 2013 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
§ -ROOT- Catalog Table – A table that lists the loca@on of the .META. table
§ .META. Catalog Table – A table that lists all the regions and their loca@ons
Catalog Tables
33 © Copyright 2013 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
§ Holds a subset of a table’s rows, like a par==on – Region is specified by its startKey and endKey
§ A table may have one or more regions – Comprised of a store per column family
§ New regions are automa=cally created as tables grow – Each region may live on a different node – Made up of several HDFS files (store files)
Regions
Store (ColumnFamily1)
row 1 . . .row 2 . . .row 3 . . .
.
.
.H
Log
Region
RegionServer
Table 'Foo'Region
Table 'Bar'
Store (ColumnFamily1)
row 1 . . .row 2 . . .row 3 . . .
.
.
.
34 © Copyright 2013 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
§ When regions get too big (256MB by default) they are automa=cally split
§ Resul=ng regions may be served by the same, or different, RegionServers
§ HBase periodically ‘balances’ the regions across RegionServers
Region Splits
35 © Copyright 2013 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
§ Data is first wrimen to the region’s Write-‐Ahead-‐Log (WAL) and then to memstore – The WAL is required for crash recovery if the memstore is lost
§ Memstore is flushed to an immutable file in HDFS (store file) periodically
§ Eventually these store files will be aggregated and cleaned up during a compac2on
Data Storage
36 © Copyright 2013 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
Agenda
§ What is HBase?
§ HBase usage scenarios § HBase table basics § HBase architecture § HBase schema fundamentals
§ Ques@ons
37 © Copyright 2013 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
§ Schema design is a combina=on of (amongst other things): – Designing the keys (row and column) – Segrega@ng data into column families – Choosing appropriate compression and block size sejngs
§ Similar techniques are needed to scale most systems – e.g., indexes, par@@oning data, consistent hashing
§ Overcome shortcomings of architecture – Denormaliza@on -‐> Replacement for JOINs – Duplica@on -‐> Design for reads – Intelligent Keys -‐> Implement indexing, sor@ng and op@mize reads
§ You must consider your access pamern when designing the table schema – Failure to do so will result in dreadful performance
Schema Fundamentals
38 © Copyright 2013 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
§ Recommend no more than three Column Families
§ Column Families allow for separa=on of data – Used by columnar databases for fast analy@cal queries, but on column level only – Data across CFs is typically not accessed simultaneously
§ Amributes are applied on a per-‐Column Family basis – e.g., different or no compression depending on the content type
Column Family Design
39 © Copyright 2013 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
§ Row keys cannot be changed – Row must be deleted and then re-‐inserted
§ Rows are sorted on insert, not on scan
§ Keys are ordered lexicographically – E.g., 1,10,100,11,12,13 . . . 2,20,21, . . . – Preserve natural ordering of numbers by lek padding with 0’s
§ The row key is the only key/index on the table – No secondary keys/indexes or foreign keys
§ Selec=ng the appropriate row keys for your applica=on is cri=cal for performance!
Row Key Design
40 © Copyright 2013 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
§ This is not a rela=onal database! – Typically there are only a few, large (denormalized) tables – Each table will have a small number of Column Families
– Within each CF you may have hundreds or thousands of columns
§ Think about your access pamerns… – Columns that are accessed together should be assigned to the same Column Family – Row keys determine how closely on disk rows are stored
– Recall that data is assigned to Regions according to Row keys
§ Include enough informa=on in your Row Key so that you can avoid table scans
§ Be wary of hotspolng
How to Design your Table?
41 © Copyright 2013 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
§ Be careful with the design of your Row Key
§ Example: a monotonically increasing value – 0000001, 0000002, 0000003, 0000004, etc.
§ All writes will go to the same RegionServer – Even aker the region splits – Results in an absolute bound on write performance
§ Instead, try to have your row keys distributed around the regions – MD5 hash of the row key for example
§ Consider your read pamern
Hotspojng: or, How to Ruin HBase Performance
42 © Copyright 2013 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
§ HBase schema design is difficult!
§ It’s the single most cri=cal factor in the performance of your HBase cluster
§ There’s way more to it than we have =me to cover here
Schema Design: Conclusion
43 © Copyright 2013 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
Agenda
§ What is HBase?
§ HBase usage scenarios § HBase table basics § HBase architecture § HBase schema fundamentals
§ Ques=ons
44 © Copyright 2013 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
§ Ques=ons? Ask away!
§ Thanks to Truecar for hos=ng the event and providing the food and drink
§ Thanks to Cloudera for providing this evening’s speaker
§ Discount on Cloudera’s HBase training course: HBaseSoCal – 15% off any HBase training class delivered by Cloudera – Expires 07/01/13
§ Teamtreehouse.com is offering a 3 month pass. Learn to build websites, create iPhone and Android apps, code with Ruby on Rails and PHP, or start a business. Please email Subash directly for that. His details are available on the meetup website
Ques@ons
Recommended