If you can't read please download the document
Upload
ayasha
View
41
Download
0
Tags:
Embed Size (px)
DESCRIPTION
HBase Spring 2014 WPI, Mohamed Eltabakh. HBase: Overview . HBase is a distributed column-oriented data store built on top of HDFS HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed Computing - PowerPoint PPT Presentation
Citation preview
PowerPoint Presentation
HBase
Spring 2014WPI, Mohamed Eltabakh1HBase: Overview HBase is a distributed column-oriented data store built on top of HDFS
HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed Computing
Data is logically organized into tables, rows and columns2
HBase: Part of Hadoops Ecosystem3
HBase is built on top of HDFS HBase files are internally stored in HDFSHBase vs. HDFSBoth are distributed systems that scale to hundreds or thousands of nodes
HDFS is good for batch processing (scans over big files)Not good for record lookupNot good for incremental addition of small batchesNot good for updates 4HBase vs. HDFS (Contd)HBase is designed to efficiently address the above pointsFast record lookupSupport for record-level insertionSupport for updates (not in place)
HBase updates are done by creating new versions of values5HBase vs. HDFS (Contd)6
If application has neither random reads or writes Stick to HDFSHBase Data Model7HBase Data ModelHBase is based on Googles Bigtable modelKey-Value pairs
8
HBase Logical View9
HBase: Keys and Column Families10
Each row has a KeyEach record is divided into Column FamiliesEach column family consists of one or more ColumnsKeyByte arrayServes as the primary key for the tableIndexed far fast lookupColumn FamilyHas a name (string)Contains one or more related columnsColumnBelongs to one column familyIncluded inside the rowfamilyName:columnName
11
Column family named ContentsColumn family named anchorColumn named apache.comVersion NumberUnique within each keyBy default Systems timestampData type is LongValue (Cell)Byte array
12
Version number for each rowvalueNotes on Data ModelHBase schema consists of several TablesEach table consists of a set of Column FamiliesColumns are not part of the schema HBase has Dynamic ColumnsBecause column names are encoded inside the cellsDifferent cells can have different columns
13
Roles column family has different columns in different cellsNotes on Data Model (Contd)The version number can be user-suppliedEven does not have to be inserted in increasing orderVersion number are unique within each keyTable can be very sparseMany cells are empty Keys are indexed as the primary key
Has two columns[cnnsi.com & my.look.ca]HBase Physical Model15HBase Physical ModelEach column family is stored in a separate file (called HTables)Key & Version numbers are replicated with each column familyEmpty cells are not stored16
HBase maintains a multi-level index on values:
Example17
Column Families18
HBase RegionsEach HTable (column family) is partitioned horizontally into regionsRegions are counterpart to HDFS blocks19
Each will be one regionHBase Architecture20Three Major Components21The HBaseMasterOne master
The HRegionServerMany region servers
The HBase client
HBase ComponentsRegionA subset of a tables rows, like horizontal range partitioningAutomatically doneRegionServer (many slaves)Manages data regionsServes data for reads and writes (using a log)Master Responsible for coordinating the slavesAssigns regions, detects failuresAdmin functions22Big Picture23
ZooKeeperHBase depends on ZooKeeper By default HBase manages the ZooKeeper instanceE.g., starts and stops ZooKeeperHMaster and HRegionServers register themselves with ZooKeeper24
Creating a TableHBaseAdmin admin= new HBaseAdmin(config);HColumnDescriptor []column;column= new HColumnDescriptor[2];column[0]=new HColumnDescriptor("columnFamily1:");column[1]=new HColumnDescriptor("columnFamily2:");HTableDescriptor desc= new HTableDescriptor(Bytes.toBytes("MyTable"));desc.addFamily(column[0]);desc.addFamily(column[1]);admin.createTable(desc);
25Operations On Regions: Get()Given a key return corresponding recordFor each value return the highest version26
Can control the number of versions you wantOperations On Regions: Scan()27
Get()Row keyTimeStampColumn anchor:com.apache.wwwt12t11t10anchor:apache.comAPACHEcom.cnn.wwwt9anchor:cnnsi.comCNNt8anchor:my.look.caCNN.comt6t5t3Select value from table where key=com.apache.www AND label=anchor:apache.comScan()Select value from table where anchor=cnnsi.comRow keyTimeStampColumn anchor:com.apache.wwwt12t11t10anchor:apache.comAPACHEcom.cnn.wwwt9anchor:cnnsi.comCNNt8anchor:my.look.caCNN.comt6t5t3Operations On Regions: Put()Insert a new record (with a new key), OrInsert a record for an existing key30
Implicit version number (timestamp)Explicit version numberOperations On Regions: Delete()Marking table cells as deletedMultiple levelsCan mark an entire column family as deletedCan make all column families of a given row as deleted31All operations are logged by the RegionServersThe log is flushed periodicallyHBase: JoinsHBase does not support joins
Can be done in the application layerUsing scan() and get() operations32Altering a Table33
Disable the table before changing the schemaLogging Operations34
HBase Deployment35
Master nodeSlavenodesHBase vs. HDFS36
HBase vs. RDBMS37
When to use HBase38