Hive on spark is blazing fast or is it final

Page 1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Hive on Spark is Blazing Fast… Or Is It?

Carter Shanklin and Mostafa Mokhtar


Why SQL on Hadoop? Solving for Scale.

Hadoop is great forcost, but MapReduce istoo difficult.

SQL on Hadoop makesHadoop real and givesme scale that traditionalSQL can’t offer.

I’m deleting importantdata because it’s tooexpensive to store it.

$


SQL at Facebook: Emergence of Apache Hive

Developed Hive to address traditional RDBMS limitations.300+ PB of data under management(1).600+ TB of data loaded daily.60,000+ Hive queries per day(2).More than 1,000 users per day.Initial Apache release in April 2009.


Hive Classic: Strengths and Challenges

Familiar SQL Interface+

Economical Processing of Petabytes+

Hive Classic tied to MapReduce, leading to latency

Traditional SQL Workloads Needed Higher Performance!


Need for Speed: The Stinger Initiative

Stinger: An Open Roadmap to improve Apache Hive’s performance 100x.

Launched: February 2013; Delivered: April 2014.

Delivered in 100% Apache Open Source.

SQL Engine

VectorizedSQL Engine

ColumnarStorage

ORCFile

= 100X+ +

Distributed Execution

Apache Tez


Stinger Phase 3: TPC-DS Benchmark at 30 Terabyte Scale

Sample of 50 queries from TPC-DS at 30 terabyte scale.

Average 52x Query Speedup, Maximum 160x Query Speedup.

Total benchmark time decreased from 7.8 days to 9.3 hours.(3)

Cost-Based Optimizer added in Hive 14 gave additional 2.5x Speedup.


Hive + Stinger at Yahoo

Around 1 million Hive jobs run every month.

ScaleiTotal benchmark time from 8.1 hours to 1.3 hours at 10TB scale.

PerformanceiUp to 82x faster.(4)

Performancei


Stinger at Spotify

Query 25 TB of compressed data in 10 Minutes across 690 nodes (MapReduce too slow to complete.)

Speedi16x less HDFS read when using ORCFile versus Avro.(5)

Efficiencyi


ORCFile at Facebook

Saved more than 1,400 servers worth of storage.

CompressioniCompression ratio increased from 5x to 8xglobally.

Compressioni


Hive on Tez: Conclusion

Hive on Tez delivers fast batch and interactive SQL today.

But users need more speed!

Proven at petabyte scale.

ScaleiThe most comprehensive open-source SQL on Hadoop.

SQLiMore than 90 Hortonworks customers use Hive-on-Tez today for fast SQL.

Speedi

Hortonworks Customer Support metrics as of Feb/2015


Next Stop: Stinger.next and Sub-Second SQL

Emergence of LLAP and Hive-on-Spark bring Sub-Second within reach.

What does it take to get Hive to sub-second?

Does Hive-on-Spark get us there?


Performance Today and the Sub-Second FutureHive on Tez, Hive on Spark, Hive on Mapreduce & Spark-SQL

Page 13 © Hortonworks Inc. 2014

Query processing in Hadoop

Cac

he

Block Cache

Linux Cache

Sto

rage

Columnar Storage

Parquet File

Dis

trid

edEx

ecu

tio

nEn

gin

e

SQL Engine

Hive Engine

SQL

SQL support

HiveQL

Tez

Columnar Storage

ORC File

MapReduce Spark

Spark-SQL

SQL Engine


Query processing in Hadoop

Cac

he

Block Cache

Linux Cache

Sto

rage

Columnar Storage

Parquet File

Dis

trid

edEx

ecu

tio

nEn

gin

e

SQL Engine

Hive Engine

SQL

SQL support

HiveQL

Tez

Columnar Storage

ORC File

MapReduce Spark

Spark-SQL

SQL Engine

What is covered today

in terms of performance


Performance comparison : Test bed

Component Version

Hive 1.2.0

Tez 0.5.2

Spark 1.2.0

Hadoop 2.6.0

Software :

Hardware

20 physical nodes, each with:

● 2x Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz for total of 16 CPU cores/machine

● Hyper-threading enabled

● 256GB RAM per node

● 6x 4TB WDC WD4000FYYZ-0 drives per node

● 10 Gigabit interconnect between the nodes

Note: Based on the YARN Node Manager’s Memory Resource setting used below, only 128 GB of RAM per node

was dedicated to query processing.

Execution

Engine

Primitives on 30TB

Scale factor

TPC-DS queries on

30TB Scale factor

TPC-DS queries on 200GB

Scale factor

Spark X X X

Tez X X X

Map Reduce X

Spark-SQL X X X

Performance benchmarks :


Performance comparison : Configurations

Hive on Tez

● 128GB of memory allocated

● 16 out of 32 Logical processors

allocated

● hive.execution.engine = tez

● hive.auto.convert.join.noconditionaltask.

size = 600MB

● Vectorization enabled

● CBO enabled

● Fetch column stats enabled

Other settings

● hive.prewarm.numcontainers = 317

● hive.tez.auto.reducer.parallelism = true

Hive on Spark



allocated

● hive.execution.engine=spark

● Configuration parameters followed

recomendation from Hive on Spark wiki

http://tinyurl.com/pk2ju8e which

also had CBO, Vectoriztion, fetch

column stats enabled etc..

● spark.master=yarn-master

Spark settings

● spark.shuffle.memoryFraction = 0.5

● spark.storage.memoryFraction = 0.1

● spark.shuffle.consolidateFiles = true

● spark.serializer = org.apache.spark.serializer.KryoSerializer

Spark-SQL



allocated

● spark.shuffle.memoryFraction = 0.5

● spark.storage.memoryFraction = 0.1

● spark.shuffle.consolidateFiles = true

● spark.serializer = org.apache.spark.serializer.KryoSerializer

● spark.sql.shuffle.partitions = 1009

● spark-sql --master yarn-client

● driver-memory 8g

● Default GC configuration

spark.sql.codegen was not enabled as it caused

most queries to fail.

http://tinyurl.com/pk2ju8e


Performance comparison : TPC-DS 200GB

● Warm timings reported, Cold queries on Spark are significantly slower

● Hive on Tez using ORC format

● Hive on Spark using Parquet format

● Spark-sql using Parquet format

1,118

1,982

1,235


Performance comparison : TPC-DS 200GB continued..

● Warm timings reported, Cold queries on Spark are significantly slower

● Hive on Tez using ORC format

● Hive on Spark using Parquet format

● Spark-sql using Parquet format

1,118

1,982

1,235

Hive on Tez is

77% faster than Hive on Spark

10% faster than Spark-sql

Spark-sql is

60% faster than Hive on Spark


Performance comparison : TPC-DS 200GB summary



Even simple

queries don’t

run in sub-

second



Even simple

queries don’t

run in sub-

second


Performance comparison : TPC-DS 200GB

● 200GB Scale factor, un-partitioned schema

● 45x unmodified queries from TPC-DS

● ORC format compression ratio 3.4x

● Parquet format compression ratio of 2.8x


Performance comparison : TPC-DS 30TB

● 30 TB Scale factor

● ORC Table format

● Fact tables partitioned on *_date_sk

● Explicit partition filters where used for Hive on Spark and Spark-SQL (but not for Hive-on-Tez)

● 20 out of the previously used queries where used, warm query timings reported

● Hive on Tez outperforms Hive on Spark and Spark-SQL by up to 18x

● Hive on Spark completed 15 out of the 20, the remaining 5 queries errored out or where stuck in GC and got cancelled

● Spark-SQL completed 7 out of the 20, the remaining 13 queries either failed within a couple of minutes or errored out after running

for hours

● Spark-SQL performance is negatively affected by in-efficient query plans as it lacks a query optimizer

Workload config

Highlights from 30TB TPC-DS test



1,828

10,098



1,828

10,098For large data set

Hive on Tez is ~5x

faster than Hive on

Spark


Performance comparison : TPC-DS 30TB continued


Performance comparison : TPC-DS 30TB continued

Failed Spark-SQL

queries


Performance comparison : TPC-DS 30TB Q17


Performance comparison : TPC-DS 30TB Q17

Hive on Tez

query ends

here


Why didn’t Spark take Hive to sub-second?

● Hive is CPU bound for most operations specially after the introduction of columnar file formats (do more with less)

● Spark consumes more CPU, Disk & Network IO than Tez

● Hive on Spark spends a lot of time translating from RDDs to Hive’s “Row Containers”


Why didn’t Spark take Hive to sub-second?

● Hive is CPU bound for most operations specially after the introduction of columnar file formats (do more with less)

● Spark consumes more CPU, Disk & Network IO than Tez for relatively large datasets

● Hive on Spark spends a lot of time translating from RDDs to Hive’s “Row Containers”

2x less

Disk IO

4x less

Network IO6x less

CPU


I don’t believe what you just said!!!

Show me some queries I can understand...

Simple queries to understand complex systems

Execution engine Primitives


Performance comparison : What are those primitives?

Group Test case Comment

ETL

Create table as select * Insert 8 Billion rows, 570 GB of Data

Create table as select with Group by Group by and Insert 8 Billion rows, 570 GB of Data

Create table as with Group by on all columns followed

by cluster by

Group by, cluster by and Insert 8 Billion rows, 570 GB of Data

Group byGroup by on primary key Group by 25 billion distinct keys

Group by on column with low NDV* Group by 82 billion rows with 8K distinct keys

Map join

store_sales x item Map join 28 Billion x 462K

store_sales x item x store Map join 28 Billion x 462K x 1.7K

store_sales x item x store x customer_demographics Map join 28 Billion x 462K x 1.7K x 1.9 Million

Shuffle Join

Shuffle join Shuffle join 8.6 Billion x 706 Million rows

Shuffle join + Group by on primary key Shuffle join 8.6 Billion x 706 Million rows followed by group by on

675 Million rows

NDV* Number of distinct values


Performance comparison : CTAS

Create table test_table as select * from store_returns;

Execution engine Elapsed time (Seconds) Tez Gain %

Hive on Tez 316

Hive on Spark 351 11%

Hive on Mapreduce 494 56%

Spark-SQL 418 32%

Table Scan

store_returns

8 Billion rows

Table Insert

8 Billion rows

316351

494

418


Performance comparison : CTAS

Create table test_table as select * from store_returns;


Hive on Tez 316



Spark-SQL 418 32%

Table Scan

store_returns

8 Billion rows

Table Insert

8 Billion rows

316351

494

418

Tez is

11% faster than Spark

56% faster than Mapreduce

32% faster than Spark-SQL


Performance comparison : CTAS with group by

Create table test_table as select * from store_returns group by *;


Hive on Tez 630

Hive on Spark 1,608 155%


Spark-SQL 1,202 91%

Table Insert

4 Billion rows

Shuffle

On all columns

8 Billion rows

Group by

On all columns

7 billion rows

Table Scan

store_returns

8 Billion rows

630

1,608

840

1,202





Hive on Tez 630



Spark-SQL 1,202 91%

Table Insert

4 Billion rows

Shuffle

On all columns

8 Billion rows

Group by

On all columns

7 billion rows

Table Scan

store_returns

8 Billion rows

630

1,608

840

1,202

This time, execution engine

must prepare, shuffle and

aggregate data.





Hive on Tez 630



Spark-SQL 1,202 91%

Table Insert

4 Billion rows

Shuffle

On all columns

8 Billion rows

Group by

On all columns

7 billion rows

Table Scan

store_returns

8 Billion rows

630

1,608

840

1,202

Tez is





Performance comparison : Select + group by on PK

select count(*) rowcount from store_sales group by ss_item_sk , ss_ticket_number having rowcount > 100000000


Hive on Tez 457



Spark-SQL 862 89%

Select

0 rows qualify

Shuffle

25 Billion rows

Group by

25 billion rows

Table Scan

25 Billion rows

Filter operator

25 billion rows

457

2,966

893 862





Hive on Tez 457



Spark-SQL 862 89%

Select

0 rows qualify

Shuffle

25 Billion rows

Group by

25 billion rows

Table Scan

25 Billion rows

Filter operator

25 billion rows

457

2,966

893 862

Group-By performed on all

25 billion distinct keys.





Hive on Tez 457



Spark-SQL 862 89%

Select

0 rows qualify

Shuffle

25 Billion rows

Group by

25 billion rows

Table Scan

25 Billion rows

Filter operator

25 billion rows

457

2,966

893 862

Tez is





Performance comparison : Select + group by on low NDV

select sum(ss_list_price) from store_sales group by ss_sold_date_sk having sum(ss_list_price) = 1


Hive on Tez 51



Spark-SQL 164 221%

Select

0 rows qualify

Group by

85 billion rows

Table Scan

85 Billion rows

Filter operator

8K rows

51

290

56

164


Performance comparison : Select + group by on low NDV

select sum(ss_list_price) from store_sales group by ss_sold_date_sk having sum(ss_list_price) = 1


Hive on Tez 51



Spark-SQL 164 221%

Select

0 rows qualify

Group by

85 billion rows

Table Scan

85 Billion rows

Filter operator

8K rows

51

290

56

164

Hive on Tez and

Hive on Spark

outperform

Spark-SQL


select count(*) from store_sales, item, store, customer_demographics where i_item_sk = ss_item_sk and s_store_sk = ss_store_sk and ss_cdemo_sk = cd_demo_sk

Performance comparison : Map join with 1,2 & 3 tables

Map join

27 Billion

rows

Map join

27 Billion rows

Map join

27 Billion rows

Table Scan

store_sales

28 Billion rows

Table Scan

customer_demographic

s

1.9 Million rows

Table Scan

item

472K rows

Table Scan

Store

1.7K rows

Execution engine Map join #1 Map join #2 Map join #3 Tez Join #1 Gain % Tez Join #2 Gain % Tez join #3 Gain %

Hive on Tez 108 145 232

Hive on Spark 106 142 289 98% 98% 125%

Hive on Mapreduce 247 280 800 228% 193% 345%

Spark-SQL 86 117 166 -20% -20% -28%


select count(*) from store_sales, item, store, customer_demographics where i_item_sk = ss_item_sk and s_store_sk = ss_store_sk and ss_cdemo_sk = cd_demo_sk

Performance comparison : Map join with 1,2 & 3 tables

Map join

27 Billion

rows

Map join

27 Billion rows

Map join

27 Billion rows

Table Scan

store_sales

28 Billion rows

Table Scan

customer_demographic

s

1.9 Million rows

Table Scan

item

472K rows

Table Scan

Store

1.7K rows

Execution engine Map join #1 Map join #2 Map join #3 Tez Join #1 Gain % Tez Join #2 Gain % Tez join #3 Gain %

Hive on Tez 108 145 232

Hive on Spark 106 142 289 98% 98% 125%

Hive on Mapreduce 247 280 800 228% 193% 345%

Spark-SQL 86 117 166 -20% -20% -28%

Spark-SQL is faster than

Hive on Tez and Hive on

Spark for Map-joins


Performance comparison : Shuffle join + group by

● select count(*) from store_sales a ,store_returns b where a.ss_item_sk = b.sr_item_sk and a.ss_ticket_number = b.sr_ticket_number

● select count(*) from store_sales a ,store_returns b where a.ss_item_sk = b.sr_item_sk and a.ss_ticket_number = b.sr_ticket_number group by

ss_item_sk , ss_ticket_number having rowcount > 1

Execution engine Shuffle join Shuffle join + group by Tez Shuffle Gain % Tez Gain %

Hive on Tez 400 453

Hive on Spark 1,078 1,120 170% 147%

Hive on Mapreduce 756 826 89% 82%

Spark-SQL 1,835 1,884 359% 316%

Shuffle Join

9 Billion rows

Group by

675 Million

rows

Table

Scan

8.6

Billion

rows

Table

Scan

6 Million

rows

Select

0 rows

Filter

675 Million

rows

400

1,078 1,120

826

453

756

1,8841,835






Shuffle Join

9 Billion rows

Group by

675 Million

rows

Table

Scan

8.6

Billion

rows

Table

Scan

6 Million

rows

Select

0 rows

Filter

675 Million

rows

400

1,078 1,120

826

453

756

1,8841,835

Tez is




Tez is





Hive on Tez 400 453

Hive on Spark 1,078 1,120 170% 147%


Spark-SQL 1,835 1,884 359% 316%






Shuffle Join

9 Billion rows

Group by

675 Million

rows

Table

Scan

8.6

Billion

rows

Table

Scan

6 Million

rows

Select

0 rows

Filter

675 Million

rows

400

1,078 1,120

826

453

756

1,8841,835

Why are shuffles so

slow for Hive on Spark

and Spark-SQL


Hive on Tez 400 453

Hive on Spark 1,078 1,120 170% 147%


Spark-SQL 1,835 1,884 359% 316%


Performance comparison : Shuffle join cluster CPU utilization



Hive on Tez

query ends

here



Hive on

Spark query

ends here


Performance comparison : Primitive results summary


Performance comparison : Performance summary

Short running query+

ETL+

Large joins and aggregates+

Slower than Spark-SQL in Map joins

High GC

Instability

SQL support limited compared to Hive

Lack of sophisticated query optimizer

Efficient resource utilization+

Map join performance+

Large Joins

Outperforms Spark-SQL in large join+

Slower than Tez for large joins and aggregates

High GC

Hive Tez

Spark-SQL

Hive on Spark

MapReduce

Promising initial release+


Solving Hive’s Top Performance Challenges


Apache Hive: Modern ArchitectureSt

ora

ge

Columnar Storage

ORCFile Parquet

Unstructured Data

JSON CSV

Text Avro

Custom

Weblog

Engi

ne

SQL Engines

Row Engine Vector Engine

SQL

SQL Support

SQL:2011 Optimizer HCatalog HiveServer2

Cac

he

Block Cache

Linux Cache

Dis

trib

ute

dEx

ecu

tio

n

Hadoop 1

MapReduce

Hadoop 2

Tez Spark

Vector Cache

LLAP

Persistent Server

Historical

Current

In Development

Legend


Sto

rage

Columnar Storage

ORCFile Parquet

Unstructured Data

JSON CSV

Text Avro

Custom

Weblog

Engi

ne

SQL Engines


SQL

SQL Support

SQL:2011 Optimizer HCatalog HiveServer2

Apache Hive: Getting to Sub-Second Improvement

LLAP: Persistent servers cache vectors and start

queries instantly. Pluggable integrations

with Tez or Spark.

Cac

he

Block Cache

Linux Cache

Dis

trib

ute

dEx

ecu

tio

n

Hadoop 1

MapReduce

Hadoop 2

Tez Spark

Historical

Current

In Development

Legend

Vector Cache

LLAP

Persistent Server


Sto

rage

Columnar Storage

ORCFile Parquet

Unstructured Data

JSON CSV

Text Avro

Custom

Weblog

Engi

ne

SQL Engines


SQL

SQL Support

SQL:2011 Optimizer HCatalog HiveServer2 Vectorized Hash Join Solves CPU

Boundedness for Hive on Tez or on

Spark.

Cac

he

Block Cache

Linux Cache

Dis

trib

ute

dEx

ecu

tio

n

Hadoop 1

MapReduce

Hadoop 2

Tez Spark

Historical

Current

In Development

Legend


Vector Cache

LLAP

Persistent Server


Sto

rage

Columnar Storage

ORCFile Parquet

Unstructured Data

JSON CSV

Text Avro

Custom

Weblog

Engi

ne

SQL Engines


SQL

SQL Support

SQL:2011 Optimizer HCatalog HiveServer2 Improved metadata catalog allows instant query planning and optimization for any

engine.

Cac

he

Block Cache

Linux Cache

Dis

trib

ute

dEx

ecu

tio

n

Hadoop 1

MapReduce

Hadoop 2

Tez Spark

Historical

Current

In Development

Legend


Vector Cache

LLAP

Persistent Server


Apache Hive’s Sub-Second Future

=Sub-Second

Hive

Metadata

Fast, Scalable

Metadata Catalog

Persistent Server

LLAP

+ +

SQL Engine

VectorizedHash Join

Choice of Execution Engines

Tez orSpark

+


Questions?

?Interested? Stop by the Hortonworks booth to learn more


Endnotes

(1) https://code.facebook.com/posts/229861827208629/scaling-the-facebook-data-warehouse-to-300-pb/

(2) https://www.facebook.com/notes/facebook-engineering/under-the-hood-scheduling-mapreduce-jobs-more-efficiently-with-corona/10151142560538920

(3) http://hortonworks.com/blog/benchmarking-apache-hive-13-enterprise-hadoop/

(4) http://yahoodevelopers.tumblr.com/post/85930551108/yahoo-betting-on-apache-hive-tez-and-yarn

(5) http://www.slideshare.net/AdamKawa/a-perfect-hive-query-for-a-perfect-meeting-hadoop-summit-2014

https://code.facebook.com/posts/229861827208629/scaling-the-facebook-data-warehouse-to-300-pb/

https://www.facebook.com/notes/facebook-engineering/under-the-hood-scheduling-mapreduce-jobs-more-efficiently-with-corona/10151142560538920

http://hortonworks.com/blog/benchmarking-apache-hive-13-enterprise-hadoop/

http://yahoodevelopers.tumblr.com/post/85930551108/yahoo-betting-on-apache-hive-tez-and-yarn

http://www.slideshare.net/AdamKawa/a-perfect-hive-query-for-a-perfect-meeting-hadoop-summit-2014