of 43/43
Scott Miao 2012/7/12 HBase Admin API & Available Clients 1

003 admin featuresandclients

  • View
    985

  • Download
    0

Embed Size (px)

Text of 003 admin featuresandclients

  • 1. Scott Miao 2012/7/12HBase Admin API & Available Clients1

2. Agenda Course Credit HBaseAdmin APIs HTableDescriptor HColumnDescriptor HBaseAdmin Available Clients Interactive Clients Batch Clients Shell Web-based UI2 3. Course Credit Show up, 30 scores Ask question, each question earns 5 scores Hands-on, 40 scores 70 scores will pass this course Each course credit will be calculated once for each coursefinished The course credit will be sent to you and your supervisor bymail3 4. Hadoop RPC framework Writable interface void write(DataOutput out) throws IOException; Serialize the Object data and send to remote void readFields(DataInput in) throws IOException; New an instance and deserialize the remote-data for subsequentoperations Parameterless Constructor Hadoop will instantiate a empty Object Call the readFields method to deserialize the remote data4 5. HTableDescriptor Constructor HTableDescriptor(); HTableDescriptor(String name); HTableDescriptor(byte[] name); HTableDescriptor(HTableDescriptor desc); ch05/admin.CreateTableExample Can be used to fine-tune the tables performance5 6. HTableDescriptor Logical V.S. physical views6 7. HTableDescriptor - PropertiesProperty DescriptionName SpecifyTable Namebyte[] getName();String getNameAsString();void setName(byte[] name);Column Families Specify column familyvoid addFamily(HColumnDescriptor family);boolean hasFamily(byte[] c);HColumnDescriptor[] getColumnFamilies();HColumnDescriptor getFamily(byte[]column);HColumnDescriptor removeFamily(byte[] column);Maximum File Size Specify maximum size a region within the table can grow tolong getMaxFileSize();void setMaxFileSize(long maxFileSize);It really about the maximum size of each store, the better name would bemaxStoreSize; By default, its size is 256 MB, a larger value may be requiredwhen you have a lot of data.7 8. HTableDescriptor - PropertiesProperty DescriptionRead-only By default, all tables are writable, If the flag is set to true, you can only readfrom the table and not modify it at all.boolean isReadOnly();void setReadOnly(boolean readOnly);Memstore flush size An in-memory store to buffer values before writing them to disk as a newstorage file. default 64 MB.long getMemStoreFlushSize();void setMemStoreFlushSize(long memstoreFlushSize);Deferred log flush Save write-ahead-log entries to disk, by default, set to false.synchronized boolean isDeferredLogFlush();void setDeferredLogFlush(boolean isDeferredLogFlush);Miscellaneous options Stored with the table definition and can be retrieved if necessary.byte[] getValue(byte[] key)String getValue(String key)Map getValues()void setValue(byte[] key,byte[] value)void setValue(String key,String value)void remove(byte[] key)8 9. HColumnDescriptor A more appropriate name would be HColumnFamilyDescriptor The family name must be printable You cannot simply rename them later Constructor HColumnDescriptor(); HColumnDescriptor(String familyName), HColumnDescriptor(byte[] familyName); HColumnDescriptor(HColumnDescriptor desc); HColumnDescriptor(byte[] familyName,int maxVersions,String compression, boolean inMemory,boolean blockCacheEnabled,int timeToLive, String bloomFilter); HColumnDescriptor(byte [] familyName,int maxVersions,String compression, boolean inMemory,boolean blockCacheEnabled,int blocksize, int timeToLive,String bloomFilter,int scope);9 10. HColumnDescriptor Column families V.S. store files10 11. Property DescriptionName Specify column family name.A column family cannot be renamed, create a new familywith the desired name and copy the data over, using theAPIbyte[] getName();String getNameAsString();MaximumversionsPredicate deletion. How many versions of each value you want to keep. Default value is 3int getMaxVersions();void setMaxVersions(int maxVersions);Compression HBase has pluggable compression algorithm support. Default value is NONE.HColumnDescriptor Properties11 12. HColumnDescriptor PropertiesProperty DescriptionBlock size All stored files are divided into smaller blocks that are loaded during a get or scanoperation, default value is 64KB.synchronized int getBlocksize();void setBlocksize(int s);HDFS is using a block size ofby default64 MBBlock cache HBase reads entire blocks of data for efficient I/O usage and retains these blocksin an in-memory cache so that subsequent reads do not need any disk operation.Thedefault is true.boolean isBlockCacheEnabled();void setBlockCacheEnabled(boolean blockCacheEnabled);if your use case only ever has sequential reads on a particular column family, it isadvisable that you disable it.Time-to-live (TTL) Predicate deletion.A threshold based on the timestamp of a value and the internalhousekeeping is checking automatically if a value exceeds itsTTL.int getTimeToLive();void setTimeToLive(int timeToLive);By default, keeping the values forever (set to Integer.MAX_VALUE)12 13. HColumnDescriptor PropertiesProperty DescriptionIn-memory lock cache and how HBase is using it to keep entire blocks of data in memory forefficient sequential access to values.The in-memory flag defaults to false.boolean isInMemory();void setInMemory(boolean inMemory);is good for small column families with few values, such as the passwords of a usertable, so that logins can be processed very fast.Bloom filter Allowing you to improve lookup times given you have a specific access pattern.Since they add overhead in terms of storage and memory, they are turned off bydefault.Replication scope It enables you to have multiple clusters that ship local updates across the network sothat they are applied to the remote copies. By default is 0.13 14. HBaseAdmin Just like a DDL in RDBMSs Create tables with specific column families Check for table existence Alter table and column family definitions Drop tables And more14 15. HBaseAdmin Basic Operations boolean isMasterRunning() HConnection getConnection() Configuration getConfiguration() close()15 16. HBaseAdmin Table Operations Table-related admin.API They are asynchronous in nature createTable() V.S. createTableAsync(), etc CreateTable ch05/admin.CreateTableExample ch05/admin.CreateTableWithRegionsExample A numRegions that is at least 3: otherwise, the call will return with anexception This is to ensure that you end up with at least a minimum set of regions16 17. HBaseAdmin Table Operations DoesTable exist ch05/admin.ListTablesExample You should be using existing table names Otherwise, org.apache.hadoop.hbase.TableNotFoundException will be thrown DeleteTable ch05/admin.TableOperationsExample Disabling a table can potentially take a very long time, up to severalminutes Depending on how much data is residual in the servers memory andnot yet persisted to disk Undeploying a region requires all the data to be written to disk first isTableAvailable() V.S. isTableEnabled()/isTableDisabled()17 18. HBaseAdmin Table Operations ModifyTable ch05/admin.ModifyTableExample HTableDescriptor.equals() Compares the current with the specified instance Returns true if they match in all properties Also including the contained column families and their respective settings18 19. HBaseAdmin Schema Operations Besides using the modifyTable() call, there are dedicatedmethods provided by the HBaseAdmin Make sure the table to be modified is disabled first All of these calls are asynchronous void addColumn(String tableName,HColumnDescriptor column) void addColumn(byte[] tableName,HColumnDescriptor column) void deleteColumn(String tableName,String columnName) void deleteColumn(byte[] tableName,byte[] columnName) void modifyColumn(String tableName,HColumnDescriptor descriptor) void modifyColumn(byte[] tableName,HColumnDescriptor descriptor)19 20. HBaseAdmin Cluster OperationsMethods in HBaseAdmin Class Description static voidcheckHBaseAvailable(Configurationconf) ClusterStatus getClusterStatus() Client application can com-municate with the remoteHBase cluster, either silently succeeds, or throws said error Retrieve an instance of the ClusterStatus class,containing detailed information about the cluster status void closeRegion(String regionname,String hostAndPort) void closeRegion(byte[] regionname,String hostAndPort)Close regions that have previously been deployed to regionservers. Does bypass any master notification, the region isdirectly closed by the region server, unseen by the masternode. void flush(StringtableNameOrRegionName) void flush(byte[]tableNameOrRegionName)Call the MemStore instances of the region or table, to flushthe cached modification data into disk. Or the data would bewritten by hitting the memstore flush size.For advanced users, so please check theseAPI in the document and handle with care20 21. HBaseAdmin Cluster OperationsMethods in HBaseAdminClassDescription void compact(StringtableNameOrRegionName) void compact(byte[]tableNameOrRegionName)Minor-compaction, compactions can potentially take a longtime to complete. It is executed in the background by theserver hosting the named region, or by all servers hostingany region of the given table void majorCompact(StringtableNameOrRegionName) void majorCompact(byte[]tableNameOrRegionName)Major-compaction void split(StringtableNameOrRegionName) void split(byte[]tableNameOrRegionName) These calls allows you to split a specific region, or table21 22. HBaseAdmin Cluster OperationsMethods in HBaseAdminClassDescription void assign(byte[] regionName,boolean force) void unassign(byte[]regionName,boolean force)A client requires a region to be deployed or undeployed fromthe region servers, it can invoke these calls. void move(byte[]encodedRegionName,byte[]destServerName)Move a region from its current region server to a new one.The destServerName parameter can be set to null to pick a newserver at random. boolean balanceSwitch(booleanb) boolean balancer() Allows you to switch the region balancer on or off. A call to balancer() will start the process of moving regions from the servers, with more deployed to those with lessdeployed regions. void shutdown() void stopMaster() void stopRegionServer(StringhostnamePort) Shut down the entire cluster Stop the master server Stop a particular region server only Once invoked, the affected servers will be stopped, that is,there is no delay nor a way to revert the process22 23. HBaseAdmin Cluster Status Information You can get more details info. about your HBase cluster fromHBaseAdmin.getClusterStatus() Related Classes ClusterStatus ServerName => HServerInfo HServerLoad RegionLoad ch05/admin.ClusterStatusExample23 24. Available Clients HBase comes with a variety of clients that can be used fromvarious programming languages Interactive Clients Native JavaAPI REST Thrift Avro Batch Clients MapReduce Hive Pig Shell Web-based UI24 25. Available Clients Interactive Clients Native JavaAPI REST Thrift Avro Batch Clients MapReduce Hive Pig Shell Web-based UIWeve already done25 26. Batch Clients MapReduce framework HDFS:A distributed filesystem MapReduce:A distributedAlgorithm26 27. Batch Clients - MapReduce framework27 28. Batch Clients - MapReduce InputFormat and TableInputFormat28 29. Batch Clients - MapReduce Mapper and TableMapper29 30. Batch Clients - MapReduce Reducer and TableReducer30 31. Batch Clients - MapReduce OutputFormat and TableOutputFomrat31 32. Batch Clients - MapReduce Sample ch07/mapreduce.Driver How to run//in root account In hbase shell createtesttable_mr,data//in hbase-user account cd ${GIT_HOME}/hbase-training/002/projects/hbase-book/ch07 Hadoop fs copyFromLocal hadoop fs -copyFromLocal test-data.txt /tmp hadoop jar target/hbase-book-ch07-1.0.jar ImportFromFile -t testtable -i/tmp/test-data.txt -c data:json How to use hadoop jar target/hbase-book-ch07-1.0.jar //will show usage32 33. Apache Pig project A platform to analyze large amounts of data It has its own high-level query language, called Pig Latin uses an imperative programming style to formulate the stepsinvolved in transforming the input data to the final output Opposite of Hives declarative approach to emulate SQL (HiveQL) Combined with the power of Hadoop and the MapReduceframeworkBatch Clients - Pig33 34. Batch Clients Pig Latin Sample--Load data from a file and write to HBaseraw = LOAD tutorial/data/excite-small.log USING PigStorage(t) AS (user, time, query);T = FOREACH raw GENERATE CONCAT(CONCAT(user, u0000), time), query;STORET INTO excite USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(colfam1:query);--Load records which just been written from HBaseR = LOAD excite USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(colfam1:query, -loadKey)AS (key: chararray, query: chararray);34 35. Shell We already used on course #1 hbase shell The majority of commands have a direct match with amethod provided by either the client or administrative API Grouped into five different categories, representing theirsemantic relationships35 36. Shell - General36 37. Shell Data definition37 38. Shell Data manipulation38 39. Shell Tools39 40. Shell Replication40 41. Web-based UI Master UI (http://${your_host}:8110/master.jsp) Main page UserTable page Zookeeper page Region Server UI Shared pages Local logs Thread Dump Log level41 42. ~Orz42 43. 43