Double Your Hadoop Performance with Hortonworks SmartSense

Preview:

Citation preview

Usi

ng B

igBe

nch

to c

ompa

re

Usi

ng B

igBe

nch

to c

ompa

re

Hiv

e an

d Sp

ark

Hiv

e an

d Sp

ark

Nic

olas

Pog

gi, A

leja

ndro

N

icol

as P

oggi

, Ale

jand

ro M

onte

roM

onte

ro

Apr

il 20

17

Outline

1.In

tro

to B

SC a

nd A

LOJA

2.Bi

gBen

ch3.

Sequ

en;a

l tes

ts1.

Dat

a sc

ales

4.Co

ncur

renc

y te

sts

5.Su

mm

ary

2

Barcelon

aSupe

rcom

pu.n

gCe

nter(B

SC)

•S

pa

nis

h n

a*

on

al su

pe

rco

mp

u*

ng

ce

nte

r 2

2 y

ea

rs h

isto

ry in

:

•C

om

pu

ter

Arc

hit

ectu

re,

ne

two

rkin

g a

nd

dis

trib

ute

d s

yste

ms

rese

arc

h

•B

ase

d a

t B

arc

elo

na

Te

ch

Un

ive

rsit

y (

UP

C)

•La

rge

on

go

ing

lif

e s

cie

nce

co

mp

uta

*o

na

l p

roje

cts

•P

rom

ine

nt

bo

dy o

f re

se

arc

h a

c*

vit

y a

rou

nd

Ha

do

op

Ha

do

op

•2

00

8-2

01

3:

SLA

Ad

ap

*ve

Sch

ed

ule

r, A

cce

lera

tors

, Lo

ca

lity

Aw

are

ne

ss,

Pe

rfo

rma

nce

Ma

na

ge

me

nt.

7 p

ub

lic

a*

on

s7

pu

bli

ca

*o

ns

•2

01

3-P

rese

nt:

CCo

st-

effi

cie

nt

up

co

min

g B

ig D

ata

arc

hit

ectu

res

(ALO

JAA

LO

JA)

8+

8

+ p

ub

lic

a*

on

sp

ub

lic

a*

on

s

ALOJA:tow

ardscost-e

ffec2veBigData

•Re

sear

ch p

roje

ct fo

r aut

oma1

ng c

hara

cter

iza1

on a

ndop

1miz

a1on

of B

ig D

ata

Big

Dat

a de

ploy

men

ts

•O

pen

sour

ce B

ench

mar

king

-to-

Insi

ghts

pla

?or

m a

nd to

ols

•La

rges

t Big

Dat

a pu

blic

repo

sito

ry (7

0,00

0+ jo

bs)

•Co

mm

unity

col

labo

ra1o

n w

ith in

dust

ry a

nd a

cade

mia

hJ

p:/

/alo

ja.b

sc.e

sh

Jp

://a

loja

.bsc

.es

Big

Dat

a B

ench

mar

king

Onl

ine

Rep

osito

ryW

eb /

ML

Ana

lytic

s

Benc

hmar

king

and

Big

Benc

h

The

need

for a

new

ben

chm

ark

stan

dard

•A

benc

hmar

k ca

ptur

es th

e so

lu3o

n to

a p

robl

em a

nd g

uide

dec

isio

nm

akin

g•

Dat

abas

e re

late

d be

nchm

arks

sta

ndar

ds•

Tran

sac3

onal

(OLT

P):

TPC

C an

d E

•D

ecis

ion

Supp

ort (

DSS

/OLA

P): T

PC H

and

DS

•An

d fo

r Big

Dat

a an

aly3

cs p

rope

r3es

?•

3 Vs

, ML,

M/R

•Be

nchm

ark

uses

:•

Syst

em tu

ning

and

deb

uggi

ng•

Spre

ad a

nd b

road

Big

Dat

a ec

osys

tem

•Se

t com

mon

rule

s•

Vend

or c

ompa

rison

•Tr

ansp

aren

cy a

cros

s th

e in

dust

ry6

Wha

t is B

igB

ench

(TPC

x-B

B1)

?•

End-

to-e

nd a

pplic

a/on

leve

l ben

chm

ark

•re

sult

of m

any

year

s of

col

labo

ra/o

n•

indu

stry

and

aca

dem

ia

•Co

vers

mos

t Big

Dat

a An

aly/

cal p

rope

r/es

(3Vs

)•

Base

d on

a re

taile

r com

pany

(ext

ensi

on o

f TPC

-DS)

7[1

]: ht

tp://

ww

w.tp

c.or

g/tp

c_do

cum

ents

_cur

rent

_ver

sion

s/pd

f/tpc

x-bb

_v1.

2.0.

pdf

Big

Ben

ch h

isto

ry

Big

Ben

ch u

se c

ases

and

pro

cess

ove

rvie

w•

3030

busi

ness

use

s ca

ses

busi

ness

use

s ca

ses

cove

ring:

•M

erch

andi

sing

Mer

chan

disi

ng,

•Pr

icin

g O

p9m

iza9

onPr

icin

g O

p9m

iza9

on•

Prod

uct

Retu

rnPr

oduc

t Re

turn

•C

usto

mer

sC

usto

mer

s...

•Im

plem

enta

9on

resu

lted

in:

•14

Dec

lara

9ve

Dec

lara

9ve

quer

ies

(SQ

L)•

7 Q

uerie

s w

ith N

atur

al L

angu

age

Proc

essi

ngN

atur

al L

angu

age

Proc

essi

ng•

4 Q

uerie

s w

ith d

ata

prep

roce

ssin

g w

ithM

apRe

duce

jobs

Map

Redu

ce jo

bs.

•5

Que

ries

with

Mac

hine

Lea

rnin

gM

achi

ne L

earn

ing

post

proc

essi

ng.

8

BigB

ench

v1.

2 –

Ref

eren

ce Im

plem

enta

7on

HD

FS

Hiv

e M

etas

tore

Map

Red

uce

Tez

Spar

k

Y ar n

Hiv

eSp

ark

SQL

Mah

out M

LC

usto

m S

park

MLl

ibM

achi

neLe

arni

ng

SQL

Engi

ne

Tabl

e M

etas

tore

Exec

utio

nEn

gine

File

syst

em

Ben

chm

arke

d sy

stem

s:

•H

ive

+ M

apR

educ

e +

Mah

out

•H

ive

+ M

apR

educ

e +

Spar

k_M

Llib

•H

ive

+ Te

z +

Mah

out

•H

ive

+ Te

z +

Spar

k_M

Llib

•Sp

ark

SQL

+ M

ahou

t•

Spar

k SQ

L +

Spar

k_M

Llib

•Sp

ark

2 SQ

L +

Mah

out

Wor

k in

pro

gres

s:

•H

ive

2•

Spar

k 2

SQL

+ Sp

ark_

MLl

ib

The

clus

ter (

I) –

HD

Insi

ght P

aaS

10

Mod

elD

4v2

# H

ead

node

s2

# W

orki

ng n

odes

4

# Zo

okee

per n

odes

3

C

PUIn

tel(R

) Xeo

n(R

) CPU

E5-

2673

v3

8 x

2,4

GH

z co

res

RA

M28

GB

HD

FSR

emot

e

Softw

are

Hor

tonW

orks

Dat

aPl

atfo

rm 2

.5

Non

-con

ditio

nal m

ap jo

in c

onve

rsio

n w

ith sm

all t

able

s les

ser t

han

319

MB

Softw

are

confi

gura

tion

Map

per/R

educ

cer/T

ezm

emor

y15

36 M

B

Map

per/R

educ

cer/T

ezH

eap

Spac

e10

24 M

B

Map

per/R

educ

cer/T

ezC

ores

1

Hiv

e M

apJo

ins

Yes

Spar

k ex

ecut

ors

4

Spar

k ex

ecut

or m

emor

y46

08 M

B

Spar

k ex

ecut

or C

ores

3

Sequ

en&a

l run

s (p

ower

)Q

uerie

s 1-

30

Aver

age

of th

ree

exec

u&os

of 1

00 G

B Sc

ale

Fact

or11

Big

Ben

ch w

orkl

oad

– po

wer

test

12

Load

to H

ive

Met

asto

reD

ata

Gen

erat

ion

Que

ry 1

HD

FSH

ive

Que

ry 2

….

Que

ry 3

0

Pure

QL

13Av

erag

e of

thre

e ex

ecut

ions

usi

ng 1

00 G

B S

cale

Fac

tor

Que

ry 1

2 C

PU b

ehav

ior

14

Te zSp

ark

1.6.

2Sp

ark

2.0.

2

Aver

age

of th

ree

exec

utio

ns u

sing

100

GB

Sca

le F

acto

r

Cus

tom

Red

ucer

s

15Av

erag

e of

thre

e ex

ecut

ions

usi

ng 1

00 G

B S

cale

Fac

tor

Que

ry 2

CPU

beh

avio

r

16

Te zSp

ark

1.6.

2Sp

ark

2.0.

2

Aver

age

of th

ree

exec

utio

ns u

sing

100

GB

Sca

le F

acto

r

Nat

ural

Lan

guag

e Pr

oces

sing

17Av

erag

e of

thre

e ex

ecut

ions

usi

ng 1

00 G

B S

cale

Fac

tor

Que

ry 2

7 C

PU b

ehav

ior

18

Te zSp

ark

1.6.

2Sp

ark

2.0.

2

Aver

age

of th

ree

exec

utio

ns u

sing

100

GB

Sca

le F

acto

r

Mac

hine

Lea

rnin

g

19Av

erag

e of

thre

e ex

ecut

ions

usi

ng 1

00 G

B S

cale

Fac

tor

Que

ry 5

CPU

beh

avio

r

20

Tez

+M

ahou

t

Tez

+Sp

ark_

MLl

ib

Aver

age

of th

ree

exec

utio

ns u

sing

100

GB

Sca

le F

acto

r

21

Aggregated

Results Av

erag

e of

thre

e ex

ecut

ions

usi

ng 1

00 G

B S

cale

Fac

tor

Scalingfrom

1GB

to1TB

Log

scal

es22

Conc

urre

ncy

runs

(thr

ough

put)

2, 4

, 8 p

aral

lel s

trea

ms

23

Big

Ben

ch w

orkl

oad

– Th

roug

hput

test

24

Que

ry 1

5Q

uery

21

….

Que

ry 1

6

Que

ry 1

2Q

uery

18

….

Que

ry 2

2

Que

ry 1

6Q

uery

30

….

Que

ry 2

9

Load

Dat

aD

ata

Gen

erat

ion

HD

FSH

ive

The

clus

ter (

II) –

HD

Insi

ght P

aaS

25

Mod

elH

DIn

sight

D4v

3

# H

ead

node

s2

# W

orki

ng n

odes

7

# Zo

okee

per n

odes

3

C

PUIn

tel(R

) Xeo

n(R

) CPU

E5-

2673

v3

8 x

2,4

GH

z co

res

RA

M28

GB

55 G

B (H

eadn

ode)

HD

FSR

emot

e

Softw

are

Hor

tonW

orks

Dat

aPl

atfo

rm 2

.5

Softw

are

confi

gura

tion

Map

per/R

educ

cer/T

ezm

emor

y15

36 M

B

Map

per/R

educ

cer/T

ezH

eap

Spac

e10

24 M

B

Map

per/R

educ

cer/T

ezC

ores

1

Hiv

e M

apJo

ins

Yes

Spar

k ex

ecut

ors

9

Spar

k ex

ecut

or m

emor

y46

08 M

B

Spar

k ex

ecut

or C

ores

3

Non

-con

ditio

nal m

ap jo

in c

onve

rsio

n w

ith sm

all t

able

s les

ser t

han

319

MB

Spar

k vs

Hiv

e +

Tez

in th

roug

hput

test

s

26

27

28

Recommended