Usi
ng B
igBe
nch
to c
ompa
re
Usi
ng B
igBe
nch
to c
ompa
re
Hiv
e an
d Sp
ark
Hiv
e an
d Sp
ark
Nic
olas
Pog
gi, A
leja
ndro
N
icol
as P
oggi
, Ale
jand
ro M
onte
roM
onte
ro
Apr
il 20
17
Outline
1.In
tro
to B
SC a
nd A
LOJA
2.Bi
gBen
ch3.
Sequ
en;a
l tes
ts1.
Dat
a sc
ales
4.Co
ncur
renc
y te
sts
5.Su
mm
ary
2
Barcelon
aSupe
rcom
pu.n
gCe
nter(B
SC)
•S
pa
nis
h n
a*
on
al su
pe
rco
mp
u*
ng
ce
nte
r 2
2 y
ea
rs h
isto
ry in
:
•C
om
pu
ter
Arc
hit
ectu
re,
ne
two
rkin
g a
nd
dis
trib
ute
d s
yste
ms
rese
arc
h
•B
ase
d a
t B
arc
elo
na
Te
ch
Un
ive
rsit
y (
UP
C)
•La
rge
on
go
ing
lif
e s
cie
nce
co
mp
uta
*o
na
l p
roje
cts
•P
rom
ine
nt
bo
dy o
f re
se
arc
h a
c*
vit
y a
rou
nd
Ha
do
op
Ha
do
op
•2
00
8-2
01
3:
SLA
Ad
ap
*ve
Sch
ed
ule
r, A
cce
lera
tors
, Lo
ca
lity
Aw
are
ne
ss,
Pe
rfo
rma
nce
Ma
na
ge
me
nt.
7 p
ub
lic
a*
on
s7
pu
bli
ca
*o
ns
•2
01
3-P
rese
nt:
CCo
st-
effi
cie
nt
up
co
min
g B
ig D
ata
arc
hit
ectu
res
(ALO
JAA
LO
JA)
8+
8
+ p
ub
lic
a*
on
sp
ub
lic
a*
on
s
ALOJA:tow
ardscost-e
ffec2veBigData
•Re
sear
ch p
roje
ct fo
r aut
oma1
ng c
hara
cter
iza1
on a
ndop
1miz
a1on
of B
ig D
ata
Big
Dat
a de
ploy
men
ts
•O
pen
sour
ce B
ench
mar
king
-to-
Insi
ghts
pla
?or
m a
nd to
ols
•La
rges
t Big
Dat
a pu
blic
repo
sito
ry (7
0,00
0+ jo
bs)
•Co
mm
unity
col
labo
ra1o
n w
ith in
dust
ry a
nd a
cade
mia
hJ
p:/
/alo
ja.b
sc.e
sh
Jp
://a
loja
.bsc
.es
Big
Dat
a B
ench
mar
king
Onl
ine
Rep
osito
ryW
eb /
ML
Ana
lytic
s
The
need
for a
new
ben
chm
ark
stan
dard
•A
benc
hmar
k ca
ptur
es th
e so
lu3o
n to
a p
robl
em a
nd g
uide
dec
isio
nm
akin
g•
Dat
abas
e re
late
d be
nchm
arks
sta
ndar
ds•
Tran
sac3
onal
(OLT
P):
TPC
C an
d E
•D
ecis
ion
Supp
ort (
DSS
/OLA
P): T
PC H
and
DS
•An
d fo
r Big
Dat
a an
aly3
cs p
rope
r3es
?•
3 Vs
, ML,
M/R
•Be
nchm
ark
uses
:•
Syst
em tu
ning
and
deb
uggi
ng•
Spre
ad a
nd b
road
Big
Dat
a ec
osys
tem
•Se
t com
mon
rule
s•
Vend
or c
ompa
rison
•Tr
ansp
aren
cy a
cros
s th
e in
dust
ry6
Wha
t is B
igB
ench
(TPC
x-B
B1)
?•
End-
to-e
nd a
pplic
a/on
leve
l ben
chm
ark
•re
sult
of m
any
year
s of
col
labo
ra/o
n•
indu
stry
and
aca
dem
ia
•Co
vers
mos
t Big
Dat
a An
aly/
cal p
rope
r/es
(3Vs
)•
Base
d on
a re
taile
r com
pany
(ext
ensi
on o
f TPC
-DS)
7[1
]: ht
tp://
ww
w.tp
c.or
g/tp
c_do
cum
ents
_cur
rent
_ver
sion
s/pd
f/tpc
x-bb
_v1.
2.0.
Big
Ben
ch h
isto
ry
Big
Ben
ch u
se c
ases
and
pro
cess
ove
rvie
w•
3030
busi
ness
use
s ca
ses
busi
ness
use
s ca
ses
cove
ring:
•M
erch
andi
sing
Mer
chan
disi
ng,
•Pr
icin
g O
p9m
iza9
onPr
icin
g O
p9m
iza9
on•
Prod
uct
Retu
rnPr
oduc
t Re
turn
•C
usto
mer
sC
usto
mer
s...
•Im
plem
enta
9on
resu
lted
in:
•14
Dec
lara
9ve
Dec
lara
9ve
quer
ies
(SQ
L)•
7 Q
uerie
s w
ith N
atur
al L
angu
age
Proc
essi
ngN
atur
al L
angu
age
Proc
essi
ng•
4 Q
uerie
s w
ith d
ata
prep
roce
ssin
g w
ithM
apRe
duce
jobs
Map
Redu
ce jo
bs.
•5
Que
ries
with
Mac
hine
Lea
rnin
gM
achi
ne L
earn
ing
post
proc
essi
ng.
8
BigB
ench
v1.
2 –
Ref
eren
ce Im
plem
enta
7on
HD
FS
Hiv
e M
etas
tore
Map
Red
uce
Tez
Spar
k
Y ar n
Hiv
eSp
ark
SQL
Mah
out M
LC
usto
m S
park
MLl
ibM
achi
neLe
arni
ng
SQL
Engi
ne
Tabl
e M
etas
tore
Exec
utio
nEn
gine
File
syst
em
Ben
chm
arke
d sy
stem
s:
•H
ive
+ M
apR
educ
e +
Mah
out
•H
ive
+ M
apR
educ
e +
Spar
k_M
Llib
•H
ive
+ Te
z +
Mah
out
•H
ive
+ Te
z +
Spar
k_M
Llib
•Sp
ark
SQL
+ M
ahou
t•
Spar
k SQ
L +
Spar
k_M
Llib
•Sp
ark
2 SQ
L +
Mah
out
Wor
k in
pro
gres
s:
•H
ive
2•
Spar
k 2
SQL
+ Sp
ark_
MLl
ib
The
clus
ter (
I) –
HD
Insi
ght P
aaS
10
Mod
elD
4v2
# H
ead
node
s2
# W
orki
ng n
odes
4
# Zo
okee
per n
odes
3
C
PUIn
tel(R
) Xeo
n(R
) CPU
E5-
2673
v3
8 x
2,4
GH
z co
res
RA
M28
GB
HD
FSR
emot
e
Softw
are
Hor
tonW
orks
Dat
aPl
atfo
rm 2
.5
Non
-con
ditio
nal m
ap jo
in c
onve
rsio
n w
ith sm
all t
able
s les
ser t
han
319
MB
Softw
are
confi
gura
tion
Map
per/R
educ
cer/T
ezm
emor
y15
36 M
B
Map
per/R
educ
cer/T
ezH
eap
Spac
e10
24 M
B
Map
per/R
educ
cer/T
ezC
ores
1
Hiv
e M
apJo
ins
Yes
Spar
k ex
ecut
ors
4
Spar
k ex
ecut
or m
emor
y46
08 M
B
Spar
k ex
ecut
or C
ores
3
Sequ
en&a
l run
s (p
ower
)Q
uerie
s 1-
30
Aver
age
of th
ree
exec
u&os
of 1
00 G
B Sc
ale
Fact
or11
Big
Ben
ch w
orkl
oad
– po
wer
test
12
Load
to H
ive
Met
asto
reD
ata
Gen
erat
ion
Que
ry 1
HD
FSH
ive
Que
ry 2
….
Que
ry 3
0
Que
ry 1
2 C
PU b
ehav
ior
14
Te zSp
ark
1.6.
2Sp
ark
2.0.
2
Aver
age
of th
ree
exec
utio
ns u
sing
100
GB
Sca
le F
acto
r
Que
ry 2
CPU
beh
avio
r
16
Te zSp
ark
1.6.
2Sp
ark
2.0.
2
Aver
age
of th
ree
exec
utio
ns u
sing
100
GB
Sca
le F
acto
r
Que
ry 2
7 C
PU b
ehav
ior
18
Te zSp
ark
1.6.
2Sp
ark
2.0.
2
Aver
age
of th
ree
exec
utio
ns u
sing
100
GB
Sca
le F
acto
r
Que
ry 5
CPU
beh
avio
r
20
Tez
+M
ahou
t
Tez
+Sp
ark_
MLl
ib
Aver
age
of th
ree
exec
utio
ns u
sing
100
GB
Sca
le F
acto
r
Big
Ben
ch w
orkl
oad
– Th
roug
hput
test
24
Que
ry 1
5Q
uery
21
….
Que
ry 1
6
Que
ry 1
2Q
uery
18
….
Que
ry 2
2
Que
ry 1
6Q
uery
30
….
Que
ry 2
9
Load
Dat
aD
ata
Gen
erat
ion
HD
FSH
ive
The
clus
ter (
II) –
HD
Insi
ght P
aaS
25
Mod
elH
DIn
sight
D4v
3
# H
ead
node
s2
# W
orki
ng n
odes
7
# Zo
okee
per n
odes
3
C
PUIn
tel(R
) Xeo
n(R
) CPU
E5-
2673
v3
8 x
2,4
GH
z co
res
RA
M28
GB
55 G
B (H
eadn
ode)
HD
FSR
emot
e
Softw
are
Hor
tonW
orks
Dat
aPl
atfo
rm 2
.5
Softw
are
confi
gura
tion
Map
per/R
educ
cer/T
ezm
emor
y15
36 M
B
Map
per/R
educ
cer/T
ezH
eap
Spac
e10
24 M
B
Map
per/R
educ
cer/T
ezC
ores
1
Hiv
e M
apJo
ins
Yes
Spar
k ex
ecut
ors
9
Spar
k ex
ecut
or m
emor
y46
08 M
B
Spar
k ex
ecut
or C
ores
3
Non
-con
ditio
nal m
ap jo
in c
onve
rsio
n w
ith sm
all t
able
s les
ser t
han
319
MB