43
Scott Miao 2012/7/12 HBase Admin API & Available Clients 1

003 admin featuresandclients

Embed Size (px)

Citation preview

Page 1: 003 admin featuresandclients

Scott Miao 2012/7/12

HBase Admin API & Available Clients

1

Page 2: 003 admin featuresandclients

Agenda

Course Credit

HBase Admin APIs

HTableDescriptor

HColumnDescriptor

HBaseAdmin

Available Clients

Interactive Clients

Batch Clients

Shell

Web-based UI

2

Page 3: 003 admin featuresandclients

Course Credit

Show up, 30 scores

Ask question, each question earns 5 scores

Hands-on, 40 scores

70 scores will pass this course

Each course credit will be calculated once for each course

finished

The course credit will be sent to you and your supervisor by

mail

3

Page 4: 003 admin featuresandclients

Hadoop RPC framework

Writable interface

void write(DataOutput out) throws IOException;

Serialize the Object data and send to remote

void readFields(DataInput in) throws IOException;

New an instance and deserialize the remote-data for subsequent

operations

Parameterless Constructor

Hadoop will instantiate a empty Object

Call the readFields method to deserialize the remote data

4

Page 5: 003 admin featuresandclients

HTableDescriptor

Constructor

HTableDescriptor();

HTableDescriptor(String name);

HTableDescriptor(byte[] name);

HTableDescriptor(HTableDescriptor desc);

ch05/admin.CreateTableExample

Can be used to fine-tune the table’s performance

5

Page 6: 003 admin featuresandclients

HTableDescriptor – Logical V.S. physical views

6

Page 7: 003 admin featuresandclients

HTableDescriptor - Properties Property Description

Name Specify Table Name

byte[] getName();

String getNameAsString();

void setName(byte[] name);

Column Families Specify column family

void addFamily(HColumnDescriptor family);

boolean hasFamily(byte[] c);

HColumnDescriptor[] getColumnFamilies();

HColumnDescriptor getFamily(byte[]column);

HColumnDescriptor removeFamily(byte[] column);

Maximum File Size Specify maximum size a region within the table can grow to

long getMaxFileSize();

void setMaxFileSize(long maxFileSize);

It really about the maximum size of each store, the better name would be

maxStoreSize; By default, it’s size is 256 MB, a larger value may be required

when you have a lot of data. 7

Page 8: 003 admin featuresandclients

HTableDescriptor - Properties Property Description

Read-only By default, all tables are writable, If the flag is set to true, you can only read

from the table and not modify it at all.

boolean isReadOnly();

void setReadOnly(boolean readOnly);

Memstore flush size An in-memory store to buffer values before writing them to disk as a new

storage file. default 64 MB.

long getMemStoreFlushSize();

void setMemStoreFlushSize(long memstoreFlushSize);

Deferred log flush Save write-ahead-log entries to disk, by default, set to false.

synchronized boolean isDeferredLogFlush();

void setDeferredLogFlush(boolean isDeferredLogFlush);

Miscellaneous options Stored with the table definition and can be retrieved if necessary.

byte[] getValue(byte[] key)

String getValue(String key)

Map<ImmutableBytesWritable, ImmutableBytesWritable> getValues()

void setValue(byte[] key, byte[] value)

void setValue(String key, String value)

void remove(byte[] key) 8

Page 9: 003 admin featuresandclients

HColumnDescriptor

A more appropriate name would be HColumnFamilyDescriptor

The family name must be printable

You cannot simply rename them later

Constructor

HColumnDescriptor();

HColumnDescriptor(String familyName),

HColumnDescriptor(byte[] familyName);

HColumnDescriptor(HColumnDescriptor desc);

HColumnDescriptor(byte[] familyName, int maxVersions, String compression,

boolean inMemory, boolean blockCacheEnabled, int timeToLive,

String bloomFilter);

HColumnDescriptor(byte [] familyName, int maxVersions, String compression,

boolean inMemory, boolean blockCacheEnabled, int blocksize,

int timeToLive, String bloomFilter, int scope); 9

Page 10: 003 admin featuresandclients

HColumnDescriptor –

Column families V.S. store files

10

Page 11: 003 admin featuresandclients

Property Description

Name Specify column family name. A column family cannot be renamed, create a new family

with the desired name and copy the data over, using the API

byte[] getName();

String getNameAsString();

Maximum

versions

Predicate deletion. How many versions of each value you want to keep. Default value is 3

int getMaxVersions();

void setMaxVersions(int maxVersions);

Compression HBase has pluggable compression algorithm support. Default value is NONE.

HColumnDescriptor – Properties

11

Page 12: 003 admin featuresandclients

HColumnDescriptor – Properties

Property Description

Block size All stored files are divided into smaller blocks that are loaded during a get or scan

operation, default value is 64KB.

synchronized int getBlocksize();

void setBlocksize(int s);

HDFS is using a block size of—by default—64 MB

Block cache HBase reads entire blocks of data for efficient I/O usage and retains these blocks

in an in-memory cache so that subsequent reads do not need any disk operation. The

default is true.

boolean isBlockCacheEnabled();

void setBlockCacheEnabled(boolean blockCacheEnabled);

if your use case only ever has sequential reads on a particular column family, it is

advisable that you disable it.

Time-to-live (TTL) Predicate deletion. A threshold based on the timestamp of a value and the internal

housekeeping is checking automatically if a value exceeds its TTL.

int getTimeToLive();

void setTimeToLive(int timeToLive);

By default, keeping the values forever (set to Integer.MAX_VALUE) 12

Page 13: 003 admin featuresandclients

HColumnDescriptor – Properties Property Description

In-memory lock cache and how HBase is using it to keep entire blocks of data in memory for

efficient sequential access to values. The in-memory flag defaults to false.

boolean isInMemory();

void setInMemory(boolean inMemory);

is good for small column families with few values, such as the passwords of a user

table, so that logins can be processed very fast.

Bloom filter Allowing you to improve lookup times given you have a specific access pattern.

Since they add overhead in terms of storage and memory, they are turned off by

default.

Replication scope It enables you to have multiple clusters that ship local updates across the network so

that they are applied to the remote copies. By default is 0.

13

Page 14: 003 admin featuresandclients

HBaseAdmin

Just like a DDL in RDBMSs

Create tables with specific column families

Check for table existence

Alter table and column family definitions

Drop tables

And more…

14

Page 15: 003 admin featuresandclients

HBaseAdmin – Basic Operations

boolean isMasterRunning()

HConnection getConnection()

Configuration getConfiguration()

close()

15

Page 16: 003 admin featuresandclients

HBaseAdmin – Table Operations

Table-related admin. API

They are asynchronous in nature

createTable() V.S. createTableAsync(), etc

Create Table

ch05/admin.CreateTableExample

ch05/admin.CreateTableWithRegionsExample

A numRegions that is at least 3: otherwise, the call will return with an

exception

This is to ensure that you end up with at least a minimum set of regions

16

Page 17: 003 admin featuresandclients

HBaseAdmin – Table Operations Does Table exist

ch05/admin.ListTablesExample

You should be using existing table names

Otherwise, org.apache.hadoop.hbase.TableNotFoundException will be thrown

Delete Table

ch05/admin. TableOperationsExample

Disabling a table can potentially take a very long time, up to several

minutes

Depending on how much data is residual in the server’s memory and

not yet persisted to disk

Undeploying a region requires all the data to be written to disk first

isTableAvailable() V.S. isTableEnabled()/isTableDisabled()

17

Page 18: 003 admin featuresandclients

HBaseAdmin – Table Operations

Modify Table

ch05/admin. ModifyTableExample

HTableDescriptor.equals()

Compares the current with the specified instance

Returns true if they match in all properties

Also including the contained column families and their respective settings

18

Page 19: 003 admin featuresandclients

HBaseAdmin – Schema Operations

Besides using the modifyTable() call, there are dedicated

methods provided by the HBaseAdmin

Make sure the table to be modified is disabled first

All of these calls are asynchronous

void addColumn(String tableName, HColumnDescriptor column)

void addColumn(byte[] tableName, HColumnDescriptor column)

void deleteColumn(String tableName, String columnName)

void deleteColumn(byte[] tableName, byte[] columnName)

void modifyColumn(String tableName, HColumnDescriptor descriptor)

void modifyColumn(byte[] tableName, HColumnDescriptor descriptor)

19

Page 20: 003 admin featuresandclients

HBaseAdmin – Cluster Operations

Methods in HBaseAdmin Class Description

• static void

checkHBaseAvailable(Configuration

conf)

• ClusterStatus getClusterStatus()

• Client application can com-municate with the remote

HBase cluster, either silently succeeds, or throws said error

• Retrieve an instance of the ClusterStatus class,

containing detailed information about the cluster status

• void closeRegion(String regionname,

String hostAndPort)

• void closeRegion(byte[] regionname,

String hostAndPort)

Close regions that have previously been deployed to region

servers. Does bypass any master notification, the region is

directly closed by the region server, unseen by the master

node.

• void flush(String

tableNameOrRegionName)

• void flush(byte[]

tableNameOrRegionName)

Call the MemStore instances of the region or table, to flush

the cached modification data into disk. Or the data would be

written by hitting the memstore flush size.

For advanced users, so please check these API in the document and handle with care

20

Page 21: 003 admin featuresandclients

HBaseAdmin – Cluster Operations Methods in HBaseAdmin

Class

Description

• void compact(String

tableNameOrRegionName)

• void compact(byte[]

tableNameOrRegionName)

Minor-compaction, compactions can potentially take a long

time to complete. It is executed in the background by the

server hosting the named region, or by all servers hosting

any region of the given table

• void majorCompact(String

tableNameOrRegionName)

• void majorCompact(byte[]

tableNameOrRegionName)

Major-compaction

• void split(String

tableNameOrRegionName)

• void split(byte[]

tableNameOrRegionName)

• …

These calls allows you to split a specific region, or table

21

Page 22: 003 admin featuresandclients

HBaseAdmin – Cluster Operations Methods in HBaseAdmin

Class

Description

• void assign(byte[] regionName,

boolean force)

• void unassign(byte[]

regionName, boolean force)

A client requires a region to be deployed or undeployed from

the region servers, it can invoke these calls.

• void move(byte[]

encodedRegionName, byte[]

destServerName)

Move a region from its current region server to a new one.

The destServerName parameter can be set to null to pick a new

server at random.

• boolean balanceSwitch(boolean

b)

• boolean balancer()

• Allows you to switch the region balancer on or off.

• A call to balancer() will start the process of moving regions

• from the servers, with more deployed to those with less

deployed regions.

• void shutdown()

• void stopMaster()

• void stopRegionServer(String

hostnamePort)

• Shut down the entire cluster

• Stop the master server

• Stop a particular region server only

• Once invoked, the affected servers will be stopped, that is,

there is no delay nor a way to revert the process 22

Page 23: 003 admin featuresandclients

HBaseAdmin –

Cluster Status Information

You can get more details info. about your HBase cluster from

HBaseAdmin.getClusterStatus()

Related Classes

ClusterStatus

ServerName => HServerInfo

HServerLoad

RegionLoad

ch05/admin.ClusterStatusExample

23

Page 24: 003 admin featuresandclients

Available Clients HBase comes with a variety of clients that can be used from

various programming languages

Interactive Clients Native Java API REST Thrift Avro

Batch Clients MapReduce Hive Pig

Shell

Web-based UI

24

Page 25: 003 admin featuresandclients

Available Clients Interactive Clients

Native Java API

REST

Thrift

Avro

Batch Clients

MapReduce

Hive

Pig

Shell

Web-based UI

We’ve already done

25

Page 26: 003 admin featuresandclients

Batch Clients – MapReduce framework

HDFS: A distributed filesystem

MapReduce: A distributed Algorithm

26

Page 27: 003 admin featuresandclients

Batch Clients - MapReduce framework

27

Page 28: 003 admin featuresandclients

Batch Clients - MapReduce

InputFormat and TableInputFormat

28

Page 29: 003 admin featuresandclients

Batch Clients - MapReduce

Mapper and TableMapper

29

Page 30: 003 admin featuresandclients

Batch Clients - MapReduce

Reducer and TableReducer

30

Page 31: 003 admin featuresandclients

Batch Clients - MapReduce

OutputFormat and TableOutputFomrat

31

Page 32: 003 admin featuresandclients

Batch Clients - MapReduce Sample

ch07/mapreduce.Driver

How to run //in root account

In hbase shell

create ‘testtable_mr’, ‘data’

//in hbase-user account

cd ${GIT_HOME}/hbase-training/002/projects/hbase-book/ch07

Hadoop fs –copyFromLocal

hadoop fs -copyFromLocal test-data.txt /tmp

hadoop jar target/hbase-book-ch07-1.0.jar ImportFromFile -t testtable -i /tmp/test-data.txt -c data:json

How to use hadoop jar target/hbase-book-ch07-1.0.jar //will show usage

32

Page 33: 003 admin featuresandclients

Apache Pig project

A platform to analyze large amounts of data

It has its own high-level query language, called Pig Latin

uses an imperative programming style to formulate the steps

involved in transforming the input data to the final output

Opposite of Hive’s declarative approach to emulate SQL (HiveQL)

Combined with the power of Hadoop and the MapReduce

framework

Batch Clients - Pig

33

Page 34: 003 admin featuresandclients

Batch Clients – Pig Latin Sample

--Load data from a file and write to HBase

raw = LOAD 'tutorial/data/excite-small.log' USING PigStorage('\t') \

AS (user, time, query);

T = FOREACH raw GENERATE \

CONCAT(CONCAT(user, '\u0000'), time), query;

STORE T INTO 'excite' USING \

org.apache.pig.backend.hadoop.hbase.HBaseStorage('colfam1:query');

--Load records which just been written from HBase

R = LOAD 'excite' USING \

org.apache.pig.backend.hadoop.hbase.HBaseStorage('colfam1:query', \

'-loadKey') AS (key: chararray, query: chararray); 34

Page 35: 003 admin featuresandclients

Shell

We already used on course #1

hbase shell

The majority of commands have a direct match with a

method provided by either the client or administrative API

Grouped into five different categories, representing their

semantic relationships

35

Page 36: 003 admin featuresandclients

Shell - General

36

Page 37: 003 admin featuresandclients

Shell – Data definition

37

Page 38: 003 admin featuresandclients

Shell – Data manipulation

38

Page 39: 003 admin featuresandclients

Shell – Tools

39

Page 40: 003 admin featuresandclients

Shell – Replication

40

Page 41: 003 admin featuresandclients

Web-based UI

Master UI (http://${your_host}:8110/master.jsp)

Main page

User Table page

Zookeeper page

Region Server UI

Shared pages

Local logs

Thread Dump

Log level

41

Page 42: 003 admin featuresandclients

呼~終於完了…Orz

42

Page 43: 003 admin featuresandclients

43