Shark - Hive on Spark

7/29/2019 Shark - Hive on Spark

1/48

Shark

CliffEngle,AntonioLupher,ReynoldXin,MateiZaharia,MichaelFranklin,IonStoica,

ScottShenker

HiveonSpark


2/48

Agenda

IntrotoSpark ApacheHive Shark SharksImprovementsoverHive Demo Alphastatus Futuredirections


3/48

WhatSparkIs

NotawrapperaroundHadoop Separate,fast,MapReduce-likeengine

In-memorydatastorageforveryfastiterativequeries

PowerfuloptimizationandschedulingInteractiveusefromScalainterpreter

CompatiblewithHadoopsstorageAPIsCanread/writetoanyHadoop-supportedsystem,includingHDFS,HBase,SequenceFiles,etc


4/48

ProjectHistory

Startedinsummerof2009 Opensourcedinearly2010 InuseatUCBerkeley,UCSF,Princeton,Klout,Conviva,Quantifind,Yahoo!Research


5/48

SparkProgrammingModel

Resilientdistributeddatasets(RDDs)

DistributedcollectionsofScalaobjects

Canbecachedinmemoryacrossclusternodes

ManipulatedlikelocalScalacollectionsAutomaticallyrebuiltonfailure


6/48

Example:LogMining

Loaderrormessagesfromalogintomemory,theninteractivelysearchforvariouspatterns

lines = spark.textFile(hdfs://...)

errors = lines.filter(_.startsWith(ERROR))messages = errors.map(_.split(\t)(2))

cachedMsgs = messages.cache()

Block1

Block2

Block3

Worker

Worker

Worker

Driver

cachedMsgs.filter(_.contains(foo)).count

cachedMsgs.filter(_.contains(bar)).count. . .

tasks

results

Cache1

Cache2

Cache3

BaseRDD

TransformedRDD

Action

Result:full-textsearchofWikipediain


7/48

ApacheHive

DatawarehousesolutiondevelopedatFacebook

SQL-likelanguagecalledHiveQLtoquerystructureddatastoredinHDFS

QueriescompiletoHadoopMapReducejobs


8/48


9/48

HivePrinciples

SQLprovidesafamiliarinterfaceforusers

Extensibletypes,functions,andstorageformats

Horizontallyscalablewithhighperformanceonlargedatasets


10/48

HiveApplications

Reporting Adhocanalysis ETLformachinelearning


11/48

HiveDownsides Notinteractive

Hadoopstartuplatencyis~20seconds,evenforsmalljobs

NoquerylocalityIfqueriesoperateonthesamesubsetofdata,theystillrunfromscratch

Readingdatafromdiskisoftenbottleneck Requiresseparatemachinelearningdataflow


12/48

SharkMotivation

Exploittemporallocality:workingsetofdatacanoftenfitinmemorytobe

reusedbetweenqueries Providelowlatencyonsmallqueries IntegratedistributedUDFsintoSQL


13/48

IntroducingShark

Shark=Spark+ Hive

RunHiveQLqueriesthroughSparkwithHiveUDF,UDAF,SerDeUtilizeSparksin-memoryRDDcachingandflexiblelanguagecapabilities


14/48

SharkintheAMPStack

Mesos

Spark

PrivateCluster AmazonEC2

Hadoop

MPI

Bagel(Pregelon

Spark)Shark

DebugTools

Streaming

Spark


15/48

Shark

PhysicaloperatorsusingSpark RelyonSparksfastexecution,faulttolerance,andin-memoryRDDs

ReuseasmuchHivecodeaspossibleConvertlogicalqueryplangeneratedfromHiveintoSparkexecutiongraph

CompatiblewithHiveRunHiveQLqueriesonexistingHDFSdatausingHivemetadata,withoutmodifications


16/48


17/48

0.1alpha:84%Hivetestspassing

(575outof683)

http://github.com/amplab/shark


18/48

0.1alpha ExperimentalSQL/RDDintegration UserselectedcachingwithCTASColumnarRDDcache

Someperformanceimprovements


19/48

Caching

UserselectedcachingwithCTASCREATETABLEmytable_cachedASSELECT*FROMmytableWHEREcount>10;

mytable_cachedbecomesanin-memorymaterializedviewthatcanbeusedtoanswerqueries.


20/48

SQL/SparkIntegration

AllowuserstoimplementsophisticatedalgorithmsasUDFsinSpark

QueryprocessingUDFsarestreamlinedval rdd = sc.sql2rdd("select foo, count(*) c from pokes group by foo")

println(rdd.count)println(rdd.mapRows(_.getInt(c)).reduce(_+_))


21/48

Performanceoptimizations

Hash-basedshuffle(speedsupgroup-bys)Hadoopdoessort-basedshufflewhichcanbeexpensive.

Limitpush-downinorderbyselect*frompokesorderbyfoolimit10

Columnarcache


22/48

LargeCachesinJVM

Javaobjectshavelargeoverhead(2-4xtheactualdatasize)

Boxing/UnboxingisslowGCisabottleneckifwehavemanylong-livedobjects


23/48

Example:CachingRows

Int,Double,Boolean

25 10.7 True

24+24+16=

64bytes*

Shouldbe~13bytes

*Estimatedon64bitVM.Implementationsmayvary.


24/48

PossibleSolutions

1. StoredeserializedJavaobjects2. Serializeobjectstoin-memory

bytearray

3. UsememoryslabexternaltoJVM

4. Storedatainprimitivearrays


25/48

DeserializedJavaObjects

+Fast(requireslittleCPU)

(Upto5Xspeedup)

-Poormemoryusage(3Xworse)

-LotsofGarbageCollectionoverhead


26/48

SerializetoBytes

+Bestmemoryefficiency

+EfficientGC-CPUusagetoserialize/

deserializeeachobj


27/48

StoreinExternalSlab

+Efficientmemory

+NoGC-CPUusagetoserialize/

deserializeeachobj-Moredifficulttoimplement


28/48

PrimitiveColumnArrays

+Efficientmemory

(similartobytearrays)+EfficientGC

+NoextraCPUoverhead


29/48

ColumnvsRowStorage

1

Column Row2 3 1

john mike sally

4.1 3.5 6.4

john 4.1

2 mike 3.5

3 sally 6.4


30/48

ColumnarBenefits

BettercompressionFewerobjectstotrackOnlymaterializenecessarycolumns=>LessCPU


31/48

ColumnarinShark

Storeallprimitivecolumnsinprimitivearraysoneachnode

LessdeserializationoverheadSpaceefficientandeasiertogarbagecollect


32/48

Result

Achievedefficientstoragespacewithoutsacrificingperformance


33/48

LimitPushdowninSort

SELECT*FROMtableORDERBYkeyLIMIT10;


34/48

MainIdea

Thelimitcanbeappliedbeforesortingalldatatoincrease

performance.EachmappercomputesitsTopKandthenasinglereducermergesthese


35/48

PossibleImplementations

1.PriorityQueue(Heap)tokeeprunningtopkinonetraversalon

eachmapper.

O(nlogk)


36/48

PossibleImplementations

2.QuickSelectalgorithm.Essentiallyquicksort,butprunes

elementsononesideofpivotwhenpossible.

O(n)


37/48

Result

WechosetouseGoogleGuavasimplementationofQuickSelect.

WeobservedmuchbetterperformancethanHivewhich

appliesthelimitaftersortingoneachreducer.


38/48

Demo

3-nodeEC2largecluster(2virtualcores,7GBofRAM)

SyntheticTPC-Hdata(2xscalefactor)


39/48

SomenumberswhilewewaitforHivetofinishrunning

Brown/Stonebrakerbenchmark70GB1AlsousedonHivemailinglist2

10AmazonEC2HighMemoryNodes(30GBofRAM/node)

NaivelycacheinputtablesCompareSharktoHive0.7

1http://database.cs.brown.edu/projects/mapreduce-vs-dbms/2https://issues.apache.org/jira/browse/HIVE-3961


40/48

Benchmarks:Query1

SELECT * FROM grep WHERE field LIKE %XYZ%;

30GBinputtable


41/48

Benchmark:Query2

5GBinputtableSELECT pagerank, pageURL FROM rankings WHEREpagerank > 10;


42/48

Status

Passing575Hivetestsoutof683.Youcanhelpusdecidewhattoimplementnext.


43/48

UnsupportedHiveFeatures

ADDFILE(useHadoopdistributedcachetodistributeuser-definedscripts)

STREAMTABLEhintnotsupportedOuterjoinwithnon-equijoinfilter

Tablestatistics(piggybackfilesink) Tablewithbuckets ORDERBYwithmultiplereducershasabug(willbefixedthisweek!)

MapJoinhasaserializationbug(willbefixedthisweek!)


44/48

UnsupportedHiveFeatures

Automaticallyconvertjoinstomapjoins Sort-mergemapjoins Partitionswithdifferentinputformats Uniquejoins WritingtotwoormoretablesinasingleSELECTINSERT

Archivingdata virtualcolumns(INPUT__FILE__NAME,BLOCK__OFFSET__INSIDE__FILE) Mergesmallfilesfromoutput


45/48

Whatscomingup(inamonth)?

Performanceimprovements(groupbys,joins)DemoofSharkona100-nodeclusterinSIGMOD2012(May22,Scottsdale,AZ)miningWikipediavisitlogdata


46/48

Whatscomingup(ayear)?

Shark-specificqueryoptimizer Automatictuningofparameters(e.g.numberofreducers)

Off-heapcaching(e.g.EhCache) AutomaticcachingbasedonqueryanalysisSharedcachinginfrastructure


47/48

Conclusion

SharkisawarehousesolutioncompatiblewithApacheHive

Canbeordersofmagnitudefaster Wearelookingforpeopletotryandwearehappytoprovidesupport


48/48

Thankyou!

Questions?

http://shark.cs.berkeley.edu(willbeupdatedtonight)

Documents

Shark - Hive on Spark