Transcript
Page 1: Hive  101: Hive Query Language

IN-0021

Hive 101: Hive Query Language2014-08-21

Jeff Clouse

Page 2: Hive  101: Hive Query Language

® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.2

Agenda

• What is Hive• HUE• HQL

– Select– Operators– Functions– Joins– Sub Queries – Union

• Hive best practices

Page 3: Hive  101: Hive Query Language

® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.3

What is Hive

• High level implementation of MapReduce

• Language is Hive Query Language - HQL

• HQL is a subset of ANSI SQL with extensions

• Metadata is stored in MySQL

• Semantics are very much like Oracle and MySQL

• There are no Updates

Page 4: Hive  101: Hive Query Language

® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.4

What is Hive

• Hive tables

• External Tables

• Warehouse Tables

• Drops in HIVE External tables delete metadata

• Drops in the HIVE warehouse really delete

Page 5: Hive  101: Hive Query Language

® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.5

HUE

• Hadoop User Experience• Provides web access to Hive

Page 6: Hive  101: Hive Query Language

® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.7

HQL Select Syntax

• Select– Select * From t1

• Distinct– Select Distinct col1 From t1

• Where– Select * From t1 where col1 = ‘US’

• Limit– Select * From t1 limit 5

• Group By– Select col1, sum(col2) as Total From t1 group by col1

• Order By– Select col1, sum(col2) as Total From t1 group by col1 order by col1

• Having– Select col1, sum(col2) as Total From t1 group by col1 having sum(col2) > 50

Page 7: Hive  101: Hive Query Language

® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.8

HQL Predicate Operators

• = Equals• <=> Equals or both sides are NULL• <>, != Not equal• < Less Than• <= Less than or equal to• > Greater than• >= Greater than or equal to• [not] between Value is equal to or between two values• is [not] NULL Check Value for NULL• like Value is like another value. Wildcards are %

and _

Page 8: Hive  101: Hive Query Language

® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.9

HQL Arithmetic Operators

• A - B Subtract B from A• A * B Multiply A and B• A / B Divide A by B• A + B Add A and B• A % B The remainder resulting from A/B

• A & B Bitwise and of A and B• A | B Bitwise or of A and B • A ^ B Bitwise xor of A and B• ~A Bitwise negation of A

Page 9: Hive  101: Hive Query Language

® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.10

HQL Logical Operators

• A and B, A && B Boolean and of A and B• A or B, A || B Boolean or of A and B• NOT A, !A Boolean negation of A• A [NOT] IN (B,…) A is in [or not] a set of values

Page 10: Hive  101: Hive Query Language

® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.11

HQL Functions

• Round(A)• Round(A,2)• Floor(A)• Ceiling(A)• Rand()

• Year(date)• Month(date)• Datediff(date1, date2)• Date_add(startdate,

days)

• Length(A)• Upper(A)• Concat(A, B, …)• Substring(A, start ,len)• Trim(A)

• Sum(A)• Count(*)• Min(A)• Max(A)

Page 11: Hive  101: Hive Query Language

® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.12

HQL Joins

• Join– Select * from table1 t1 join table2 t2 on t1.key = t2.key

• Only returns records from both tables

• Outer Joins– Left

• Select * from table1 t1 left join table2 t2 on t1.key = t2.key– Returns all rows from the left table, t1, and matching rows from the right table. Missing

rows from the right table will be populated with NULL

– Right• Select * from table1 t1 right join table2 t2 on t1.key = t2.key

– Returns all rows from the right table, t2, and matching rows from the left table. Missing rows from the left table will be populated with NULL

– Full • Select * from table1 t1 full outer join table2 t2 on t1.key = t2.key

– Returns all rows from both tables. Missing rows from either table will be populated with NULL

Page 12: Hive  101: Hive Query Language

® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.13

HQL SubQueries and Union

• Used to combine multiple result sets• Only UNION ALL is supported currently• The number and name of columns returned by each select statement must

be the same.Select *from (

Select col1, col2from t1UNION ALLselect col1, col2from t2

) unionResults• Sub-queries are only supported in the from clause• Support for sub-queries in the where clause will be limited to IN and

EXISTS in Hive 0.13

Page 13: Hive  101: Hive Query Language

® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.17

Hive best practices• Smallest to largest tables for joins• Data Layout

– Partition large tables– Use the partition in your where clause

Page 14: Hive  101: Hive Query Language

® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.18

Partitioning – by Month

Jan Feb Dec

Trans

F0100

F0101

F0103

F0102

F0200

F0201

F0203

F0202

F1200

F1201

F1203

F1202

Table

Partitioned by Month

Files withinthe partitions

F0105

F0104 F1204

Page 15: Hive  101: Hive Query Language

® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.19

Hive best practices• Smallest to largest tables for joins• Data Layout

– Partition large tables– Use the partition in your where clause– Bucketing

Page 16: Hive  101: Hive Query Language

® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.20

Bucketing– by Basket_id

TransTables

Files containing Rows with same hash for Bucket_Id

Trans_item

Page 17: Hive  101: Hive Query Language

® © 2013 Inmar, Inc. All Rights Reserved. © 2014 Inmar, Inc. All Rights Reserved.21

Hive best practices• Smallest to largest tables for joins• Data Layout

– Partition large tables– Use the partition in your where clause– Bucketing

• Data Sampling– Bucket TABLESAMPLE(bucket 30 out of 64 on basket_id)– Block TABLESAMPLE(1 PERCENT)

• Parallel Processing– set hive.exec.parallel=true;

Page 18: Hive  101: Hive Query Language

® © 2014 Inmar, Inc. All Rights Reserved.

Questions?


Recommended