32
Big Data Router for Real-Time Analytics

Big Data Router for Real-Time Analytics · bf3stats#(MongoDB)# BioCAMRAW# Big Data Router for Real-Time Analytics country_name AK74 M416 M16 AEK971 F2000 FAMAS AUGA3 KH2002 AN94 G3A3

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Big Data Router for Real-Time Analytics · bf3stats#(MongoDB)# BioCAMRAW# Big Data Router for Real-Time Analytics country_name AK74 M416 M16 AEK971 F2000 FAMAS AUGA3 KH2002 AN94 G3A3

Big Data Router for Real-Time Analytics

Page 2: Big Data Router for Real-Time Analytics · bf3stats#(MongoDB)# BioCAMRAW# Big Data Router for Real-Time Analytics country_name AK74 M416 M16 AEK971 F2000 FAMAS AUGA3 KH2002 AN94 G3A3

Big Data Router for Real-Time Analytics

Real-­‐&me  Analy&cs  –  How  it  Started…

Page 3: Big Data Router for Real-Time Analytics · bf3stats#(MongoDB)# BioCAMRAW# Big Data Router for Real-Time Analytics country_name AK74 M416 M16 AEK971 F2000 FAMAS AUGA3 KH2002 AN94 G3A3

Big Data Router for Real-Time Analytics

Ba:lefield  3  Player  Sta&s&cs

•  EA  Collected  50TB/day  2013.  • Available  Player  Stats  sites:  

•  h?p://ba?lelog.ba?lefield.com  •  h?p://bf3stats.com  

•  Features  per  gun/vehicle/class  leader  boards  etc.  

• Geo-­‐leader  boards  introduced  when  Ba?lefield  4  was  released  November  2013.  

•  Lacks  interesOng  analysis!  

Page 4: Big Data Router for Real-Time Analytics · bf3stats#(MongoDB)# BioCAMRAW# Big Data Router for Real-Time Analytics country_name AK74 M416 M16 AEK971 F2000 FAMAS AUGA3 KH2002 AN94 G3A3

Big Data Router for Real-Time Analytics

Harvested  Player  Data  from  bf3stats.com

• Roughly  2  million  player  records  •  Each  player  record  has  1076  fields  •  EffecOvely  a  spread  sheet  with  2  billion  cells    Details:  •  Each  player  record  has  a  field  country.  •  Each  player  record  has  fields  for  all  assault  rifles:  

 AK-­‐74,  M416,  M16,  AEK-­‐971,  F2000,  FAMAS,  AUG-­‐A3,  KH-­‐2002,    AN-­‐94,  G3A3,  SCAR-­‐L,  L85A2  

Page 5: Big Data Router for Real-Time Analytics · bf3stats#(MongoDB)# BioCAMRAW# Big Data Router for Real-Time Analytics country_name AK74 M416 M16 AEK971 F2000 FAMAS AUGA3 KH2002 AN94 G3A3

Big Data Router for Real-Time Analytics

Ques&on

For  each  country  &  assault  rifle:  What  percent  of  players  have  each  assault  rifle  as  favorite  

assault  rifle?    

Bf3stats  (MongoDB):  >1h  BioCAM  RAW:  37  milliseconds  

6,56  

1,57  

0,00  

1,00  

2,00  

3,00  

4,00  

5,00  

6,00  

7,00  

Favorite  Assault  Rifle  

Log10(milliseconds)  

bf3stats  (MongoDB)   BioCAM  RAW  

Page 6: Big Data Router for Real-Time Analytics · bf3stats#(MongoDB)# BioCAMRAW# Big Data Router for Real-Time Analytics country_name AK74 M416 M16 AEK971 F2000 FAMAS AUGA3 KH2002 AN94 G3A3

Big Data Router for Real-Time Analytics

country_name AK-­‐74 M416 M16 AEK-­‐971 F2000 FAMAS AUG  A3 KH  2002 AN-­‐94 G3A3 SCAR-­‐L L85A2

Sweden 12,31% 20,98% 27,32% 19,13% 7,43% 3,65% 2,26% 1,87% 1,20% 2,11% 0,39% 1,34%

United  States 11,19% 23,68% 25,80% 16,53% 8,05% 4,26% 2,63% 1,71% 1,45% 2,26% 0,62% 1,83%

Russian  FederaOon 22,95% 12,96% 22,35% 26,44% 6,09% 1,85% 1,85% 1,76% 1,57% 1,18% 0,35% 0,66%

France 11,72% 17,02% 33,34% 14,88% 8,79% 6,71% 2,15% 1,79% 0,90% 1,34% 0,35% 1,01%

United  Kingdom 13,34% 21,40% 26,52% 16,34% 7,68% 4,03% 2,45% 1,65% 1,05% 1,72% 0,43% 3,40%

Extract  from  the  Analysis

Conclusion:  Player  have  a  preference  for  weapons  used  by  their  country’s  armed  forces!  

Page 7: Big Data Router for Real-Time Analytics · bf3stats#(MongoDB)# BioCAMRAW# Big Data Router for Real-Time Analytics country_name AK74 M416 M16 AEK971 F2000 FAMAS AUGA3 KH2002 AN94 G3A3

Big Data Router for Real-Time Analytics

Conclusion

•  Sufficient  reporOng  speed  to  handle  high  velocity  data  flows  •  Fast  enough  to  perform  analysis  in  real-­‐Ome  on-­‐the-­‐fly              

                             BioCAM  Web  Service      

Page 8: Big Data Router for Real-Time Analytics · bf3stats#(MongoDB)# BioCAMRAW# Big Data Router for Real-Time Analytics country_name AK74 M416 M16 AEK971 F2000 FAMAS AUGA3 KH2002 AN94 G3A3

Big Data Router for Real-Time Analytics

BioCAM  Web  Service

• Core  BioCAM  AnalyOcs  Engine  • Duda  Web  Services  Framework  (h?p://duda.io)  • Monkey  Web  Server  (h?p://monkey-­‐project.com)  • HTTP(S)/JSON  Web  Service  Interface  • Create  mulOple  BioCAM  instances  with  different  schemes  • Arbitrarily  deep  break  downs  for  various  kinds  of  analysis  •  Each  break  down  serves  mulOple  aggregates  • Drill-­‐downs  naOvely  supported  from  the  Web  Service  API  

Duda      

BioCAM  

Monkey  

HTTP/JSON  

Page 9: Big Data Router for Real-Time Analytics · bf3stats#(MongoDB)# BioCAMRAW# Big Data Router for Real-Time Analytics country_name AK74 M416 M16 AEK971 F2000 FAMAS AUGA3 KH2002 AN94 G3A3

Big Data Router for Real-Time Analytics

RTDS  (Real-­‐Time  Data  Storage)

• NoSQL  graph  database  to  persistently  store  generic  interconnected  objects  in  an  applicaOon  

•  Linked  directly  into  the  applicaOon  to  store  its  state  

• Designed  for  telecom  requirements  •  24/7  always  low  latency  (no  maintenance  windows!),  1+1  mirroring,  fast  switchover  and  failover,  upgrades  in  runOme  

•  Side-­‐effect:  low  overhead  and  energy  efficient  

Duda      

BioCAM  

Monkey  

HTTP/JSON  

RTDS  

Page 10: Big Data Router for Real-Time Analytics · bf3stats#(MongoDB)# BioCAMRAW# Big Data Router for Real-Time Analytics country_name AK74 M416 M16 AEK971 F2000 FAMAS AUGA3 KH2002 AN94 G3A3

Big Data Router for Real-Time Analytics

Real-­‐Time  Data  Storage  (RTDS)

• Persistent  NoSQL  graph  database  •  Stores  generic  interconnected  objects  in  an  applicaOon  

•  Linked  directly  into  the  applicaOon  to  store  its  state  •  Low  overhead  •  Energy  efficient  

 

Duda      

BioCAM  

Monkey  

HTTP/JSON  

RTDS  

Page 11: Big Data Router for Real-Time Analytics · bf3stats#(MongoDB)# BioCAMRAW# Big Data Router for Real-Time Analytics country_name AK74 M416 M16 AEK971 F2000 FAMAS AUGA3 KH2002 AN94 G3A3

Big Data Router for Real-Time Analytics

Real-­‐Time  Data  Storage  cont.

• Designed  for  telecom  requirements  •  24/7  always  low  latency  • No  maintenance  windows  •  1+1  mirroring  •  Fast  switchover  and  failover  • Upgrades  in  runOme  

Duda      

BioCAM  

Monkey  

HTTP/JSON  

RTDS  

Page 12: Big Data Router for Real-Time Analytics · bf3stats#(MongoDB)# BioCAMRAW# Big Data Router for Real-Time Analytics country_name AK74 M416 M16 AEK971 F2000 FAMAS AUGA3 KH2002 AN94 G3A3

Big Data Router for Real-Time Analytics

RTDS  –  Internal  Workings

•  Data  is  stored  as  a  transacOon  log  •  Proven  method,  provides  atomic  transacOons,  audit  history  and  correctly  ordered  updates  in  hot  standby  instance  

•  Robust  in  crash  scenarios  (corrupOon  in  end  of  log  only)  •  Self-­‐rotaOng  transacOon  log  •  No  checkpoinOng  (as  it  introduces  latency  and  peaks  in  CPU/RAM  resources)  

•  Background  object  traversal  of  all  objects,  writes  latest  state  to  log,  when  complete  log  is  rotated  

•  ~1%  of  CPU,  no  latency  peaks,  no  resource  peaks,  only  last  two  logs  required  for  restoring  complete  state  

Duda      

BioCAM  

Monkey  

HTTP/JSON  

RTDS  

Page 13: Big Data Router for Real-Time Analytics · bf3stats#(MongoDB)# BioCAMRAW# Big Data Router for Real-Time Analytics country_name AK74 M416 M16 AEK971 F2000 FAMAS AUGA3 KH2002 AN94 G3A3

Big Data Router for Real-Time Analytics

Real-­‐Time  Data  Storage  cont.

• Default  operaOon:  asynch  without  locks  •  Lock-­‐free  algorithms  to  get  and  commit  transacOon  buffers  

• Background  threads  for  log  flushing  and  mirroring  

• Avoids  latency  and  priority  inversions  •  Locks  will  be  engaged  in  overload  situaOons  • Overhead:  one  RAM  copy  per  object  •  For  background  traversal,  verify  state  consistency  etc  

Duda      

BioCAM  

Monkey  

HTTP/JSON  

RTDS  

Page 14: Big Data Router for Real-Time Analytics · bf3stats#(MongoDB)# BioCAMRAW# Big Data Router for Real-Time Analytics country_name AK74 M416 M16 AEK971 F2000 FAMAS AUGA3 KH2002 AN94 G3A3

Big Data Router for Real-Time Analytics

Three  companies,  one  binary!

RTDS  

Duda      

BioCAM  

Monkey  Monkey  Sooware  Company  

Oricane  AB  

Xarepo  AB  

Page 15: Big Data Router for Real-Time Analytics · bf3stats#(MongoDB)# BioCAMRAW# Big Data Router for Real-Time Analytics country_name AK74 M416 M16 AEK971 F2000 FAMAS AUGA3 KH2002 AN94 G3A3

Big Data Router for Real-Time Analytics

BioCAM  –  Internal  Representa&on

• Records  consists  of  value  fields  and  class  fields  • Value  fields  are  typically  numbers  (price,  quanOty,  temperature  etc.)  •  Three  types  of  class  fields  

•  Explicit:  color,  brand,  country  etc.  •  Implicit:  Omestamp  falling  within  hour,  week,  month  etc.  •  SyntheSc:  favourite  assault  rifle  

• Class  field  values  are  mapped  to  unsigned  integers  • Master  key  built  by  packing  class  fields  into  a  large  unsigned  integer  

Class  field  1   Class  field  2   Class  field  3  Class  field  4   Class  field  5  

Page 16: Big Data Router for Real-Time Analytics · bf3stats#(MongoDB)# BioCAMRAW# Big Data Router for Real-Time Analytics country_name AK74 M416 M16 AEK971 F2000 FAMAS AUGA3 KH2002 AN94 G3A3

Big Data Router for Real-Time Analytics

Breakdown

• MulO-­‐branch  tree  structure  •  Each  level  corresponds  to  a  unique  class  field  • Not  all  class  fields  need  to  be  present  • Branches  corresponds  to  class  field  values  •  The  branches  (field  values)  traversed  from  root  to  leaf  is  called  a  path  • Records  matching  a  path  are  recorded  in  the  corresponding  leaf        

Page 17: Big Data Router for Real-Time Analytics · bf3stats#(MongoDB)# BioCAMRAW# Big Data Router for Real-Time Analytics country_name AK74 M416 M16 AEK971 F2000 FAMAS AUGA3 KH2002 AN94 G3A3

Big Data Router for Real-Time Analytics

Breakdown  Construc&on

•  For  each  record  a  handle  is  created  •  Each  handle  contain  a  reference  to  the  record  and  a  slave  key  •  The  slave  key  is  an  integer  representaOon  of  path  where  field  values  from  higher  levels  are  stored  in  more  significant  bits  

• Array  of  handles  is  sorted  by  increasing  slave  keys  •  Implicit  tree  structure  is  built  bo?om  up  from  the  sorted  array  

ComputaOonal  complexity  dominated  by  sorOng!  

Page 18: Big Data Router for Real-Time Analytics · bf3stats#(MongoDB)# BioCAMRAW# Big Data Router for Real-Time Analytics country_name AK74 M416 M16 AEK971 F2000 FAMAS AUGA3 KH2002 AN94 G3A3

Big Data Router for Real-Time Analytics

Aggregates

•  Zero  or  more  aggregates  are  associated  with  each  breakdown  • Aggregate  values  are  associated  with  breakdown  nodes  and  leaves  • Aggregate  funcSons  are  associated  with  breakdown  levels  •  Leaf  aggregate  values  are  computed  from  value  fields  in  the  records  using  the  leaf  aggregate  funcOon  

• Node  aggregate  values  are  computed  from  childrens  aggregate  values  using  the  node  aggregate  funcion    

•  Typically  only  one  value  field  in  records  is  considered    •  Typically  aggregate  funcOons  are  idenOcal  between  levels  

Page 19: Big Data Router for Real-Time Analytics · bf3stats#(MongoDB)# BioCAMRAW# Big Data Router for Real-Time Analytics country_name AK74 M416 M16 AEK971 F2000 FAMAS AUGA3 KH2002 AN94 G3A3

Big Data Router for Real-Time Analytics

Example

Country:  Sweden  (S),  Finland  (F),  Denmark  (D),  Norway  (N)  Brand:  Audi  (A),  Ford  (F),  Volvo  (V)  Color:  White  (W),  Red  (R),  Blue  (B)  Breakdown:  Brand,  Color,  Country  Aggregate:  Sales  

Page 20: Big Data Router for Real-Time Analytics · bf3stats#(MongoDB)# BioCAMRAW# Big Data Router for Real-Time Analytics country_name AK74 M416 M16 AEK971 F2000 FAMAS AUGA3 KH2002 AN94 G3A3

Big Data Router for Real-Time Analytics

Example

A  

W  R  B  

D   F   N   S   D   F   N   S  D   F   N   S  

F  

W  R  B  

D   F   N   S   D   F   N   S  D   F   N   S  

V  

W  R  B  

D   F   N   S   D   F   N   S  D   F   N   S  

Brand  

Color  

Country  

Audi  White  Finland  

Page 21: Big Data Router for Real-Time Analytics · bf3stats#(MongoDB)# BioCAMRAW# Big Data Router for Real-Time Analytics country_name AK74 M416 M16 AEK971 F2000 FAMAS AUGA3 KH2002 AN94 G3A3

Big Data Router for Real-Time Analytics

 Tradi&onal  Analy&cs  in  Retail

1.  E-­‐receipts  sent  to  Data  Warehouse  2.  Analysis  of  new  and  historical  data  3.  Infrequent  reports  (once  per  week  etc.)  

 Data  not  relevant  to  ”what’s  happening  now”  involved  in  the  analysis  

1   2  

3  

Page 22: Big Data Router for Real-Time Analytics · bf3stats#(MongoDB)# BioCAMRAW# Big Data Router for Real-Time Analytics country_name AK74 M416 M16 AEK971 F2000 FAMAS AUGA3 KH2002 AN94 G3A3

Big Data Router for Real-Time Analytics

 Real-­‐&me  On-­‐the-­‐fly  Analy&cs  in  Retail

1.  E-­‐receipts  sent  to  Data  Warehouse  2.  E-­‐receipts  intercepted/sent  in  real-­‐Ome  to  

BioCAM  WS  3.  Analysis  performed  on-­‐the-­‐fly  4.  ReporOng  in  real-­‐Ome  

 Real-­‐Ome  monitoring,  analysis  and  reporOng  with  minimum  stress  on  the  data  warewouse  

1  

4  

BioCAM  Web  Service  

2  

3  

Page 23: Big Data Router for Real-Time Analytics · bf3stats#(MongoDB)# BioCAMRAW# Big Data Router for Real-Time Analytics country_name AK74 M416 M16 AEK971 F2000 FAMAS AUGA3 KH2002 AN94 G3A3

Big Data Router for Real-Time Analytics

Whatever  Mart,  Inc.  The  Mul&  Tera  Dollar  Retail  Corpora&on •  1.500  stores  distributed  across  the  globe  open  10.00-­‐18.00  •  15.000  unique  products  when  taking  size,  color  etc.  into  account  •   Customer  purchases  an  average  of  30  random  products  in  each  open  store  every  second  

• At  peak  rate  2.300  customers  purchase  45.000  products  per  second  thus  surpassing  500.000  USD  per  second  net  sales  

•  E-­‐receipts  are  reported  immediately  to  BioCAM  Web  Service  •  Five  different  analyses  are  performed  every  ten  seconds  • Reports  are  presented  on  a  dashboard  and  updated  in  real-­‐Ome    

Page 24: Big Data Router for Real-Time Analytics · bf3stats#(MongoDB)# BioCAMRAW# Big Data Router for Real-Time Analytics country_name AK74 M416 M16 AEK971 F2000 FAMAS AUGA3 KH2002 AN94 G3A3

Big Data Router for Real-Time Analytics

Whatever  Mart,  Inc.  The  Mul&  Tera  Dollar  Retail  Corpora&on

Almost  1000  billion  transacOons  since  launch  

whatever.oricane.com  

Page 25: Big Data Router for Real-Time Analytics · bf3stats#(MongoDB)# BioCAMRAW# Big Data Router for Real-Time Analytics country_name AK74 M416 M16 AEK971 F2000 FAMAS AUGA3 KH2002 AN94 G3A3

Big Data Router for Real-Time Analytics

Benchmarks

ConfiguraOons:  • Web  Service  –  Access  via  Web  Service  front-­‐end  • Direct  access  –  Test  program  linked  with  BioCAM,  access  via  C  API  •  Stripped  –  Direct  access  to  BioCAM  stripped  from  RTDS  •  Four  different  data  bases  sizes  (number  of  records)  •  Six  different  transacOons  loads  (records  updates  per  second)  

Page 26: Big Data Router for Real-Time Analytics · bf3stats#(MongoDB)# BioCAMRAW# Big Data Router for Real-Time Analytics country_name AK74 M416 M16 AEK971 F2000 FAMAS AUGA3 KH2002 AN94 G3A3

Big Data Router for Real-Time Analytics

Aggregate  Value  Re-­‐calcula&on  Time

•  2500-­‐3000  record  transacOons  per  second  • Re-­‐calculaOon  speed  not  dependent  on  transacOons/second  • Measured  in  milliseconds  

Web  Service Direct  Access Stripped 35 31 29

167 153 133 804 711 650 1580 1429 1302

Page 27: Big Data Router for Real-Time Analytics · bf3stats#(MongoDB)# BioCAMRAW# Big Data Router for Real-Time Analytics country_name AK74 M416 M16 AEK971 F2000 FAMAS AUGA3 KH2002 AN94 G3A3

Big Data Router for Real-Time Analytics

Transac&on  Time

Web  Service Direct  Access Stripped Load  (x/s) Time  (us) Load  (x/s) Time  (us) Load  (x/s) Time  (us)

454 1201 407 183 483 144 1463 1824 1246 161 1464 125 2510 2684 2036 143 2275 118 2930 3064 2408 132 2772 109 4568 32150 3414 128 4107 100 5975 235471 4583 120 5742 91

Page 28: Big Data Router for Real-Time Analytics · bf3stats#(MongoDB)# BioCAMRAW# Big Data Router for Real-Time Analytics country_name AK74 M416 M16 AEK971 F2000 FAMAS AUGA3 KH2002 AN94 G3A3

Big Data Router for Real-Time Analytics

Direct  Access

Page 29: Big Data Router for Real-Time Analytics · bf3stats#(MongoDB)# BioCAMRAW# Big Data Router for Real-Time Analytics country_name AK74 M416 M16 AEK971 F2000 FAMAS AUGA3 KH2002 AN94 G3A3

Big Data Router for Real-Time Analytics

Stripped

Page 30: Big Data Router for Real-Time Analytics · bf3stats#(MongoDB)# BioCAMRAW# Big Data Router for Real-Time Analytics country_name AK74 M416 M16 AEK971 F2000 FAMAS AUGA3 KH2002 AN94 G3A3

Big Data Router for Real-Time Analytics

Conclusion

• Aggregate  value  re-­‐calculaOon  cost  linear  in  data  base  size  is  expected  since  the  opOmized  re-­‐calculaOon  scheme  is  not  yet  implemented  

•  TransacOon  cost  completely  dominated  by  Web  Service  front-­‐end  especially  at  higher  load  

• Would  be  interesOng  to  bi-­‐pass  the  web  server  and  run  JSON  over  IP  •  TransacOon  cost  for  Direct  Access  and  stripped  decreases  with  higher  load  most  likely  due  to  reduced  context  switching  and  higher  cache  locality  

Page 31: Big Data Router for Real-Time Analytics · bf3stats#(MongoDB)# BioCAMRAW# Big Data Router for Real-Time Analytics country_name AK74 M416 M16 AEK971 F2000 FAMAS AUGA3 KH2002 AN94 G3A3

Big Data Router for Real-Time Analytics

Key  Applica&on  Area:  Gaming • Counter  Strike  Global  Offensive  (CSGO)  Real-­‐Ome  StaOsOcs  Site  to  be  launched  

• Currently  150  000  players  on-­‐line  simultaneously  

• Player  base  grows  exponenOally  

• Partnership  with  World  #1  CSGO  team  Ninjas  in  Pyjamas  (www.nip.gl)  

Image  source:  h?p://www.pcgamer.com/valve-­‐explains-­‐how-­‐csgo-­‐became-­‐the-­‐second-­‐most-­‐played-­‐game-­‐on-­‐steam/  

Page 32: Big Data Router for Real-Time Analytics · bf3stats#(MongoDB)# BioCAMRAW# Big Data Router for Real-Time Analytics country_name AK74 M416 M16 AEK971 F2000 FAMAS AUGA3 KH2002 AN94 G3A3

Big Data Router for Real-Time Analytics

Key  Applica&on  Area:  Energy • Oricane  is  involved  in  Cloudberry  Datacenters  (h?p://www.cloudberry-­‐datacenters.com)  

•  Focus  is  on  energy  savings  in  data  centers  -­‐  discussions  are  slow…  • Oricane  want  to  address:  

•  Energy  producOon  •  Energy  trading  •  Embedded  applicaOons  

•  Looking  for  a  fast  paced  key  partner  with  lots  of  data  to  process  • Pilot  project  -­‐  value  creaOon  from  ultra  high  analyOcs  performance