Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Big Data Router for Real-Time Analytics
Big Data Router for Real-Time Analytics
Real-‐&me Analy&cs – How it Started…
Big Data Router for Real-Time Analytics
Ba:lefield 3 Player Sta&s&cs
• EA Collected 50TB/day 2013. • Available Player Stats sites:
• h?p://ba?lelog.ba?lefield.com • h?p://bf3stats.com
• Features per gun/vehicle/class leader boards etc.
• Geo-‐leader boards introduced when Ba?lefield 4 was released November 2013.
• Lacks interesOng analysis!
Big Data Router for Real-Time Analytics
Harvested Player Data from bf3stats.com
• Roughly 2 million player records • Each player record has 1076 fields • EffecOvely a spread sheet with 2 billion cells Details: • Each player record has a field country. • Each player record has fields for all assault rifles:
AK-‐74, M416, M16, AEK-‐971, F2000, FAMAS, AUG-‐A3, KH-‐2002, AN-‐94, G3A3, SCAR-‐L, L85A2
Big Data Router for Real-Time Analytics
Ques&on
For each country & assault rifle: What percent of players have each assault rifle as favorite
assault rifle?
Bf3stats (MongoDB): >1h BioCAM RAW: 37 milliseconds
6,56
1,57
0,00
1,00
2,00
3,00
4,00
5,00
6,00
7,00
Favorite Assault Rifle
Log10(milliseconds)
bf3stats (MongoDB) BioCAM RAW
Big Data Router for Real-Time Analytics
country_name AK-‐74 M416 M16 AEK-‐971 F2000 FAMAS AUG A3 KH 2002 AN-‐94 G3A3 SCAR-‐L L85A2
Sweden 12,31% 20,98% 27,32% 19,13% 7,43% 3,65% 2,26% 1,87% 1,20% 2,11% 0,39% 1,34%
United States 11,19% 23,68% 25,80% 16,53% 8,05% 4,26% 2,63% 1,71% 1,45% 2,26% 0,62% 1,83%
Russian FederaOon 22,95% 12,96% 22,35% 26,44% 6,09% 1,85% 1,85% 1,76% 1,57% 1,18% 0,35% 0,66%
France 11,72% 17,02% 33,34% 14,88% 8,79% 6,71% 2,15% 1,79% 0,90% 1,34% 0,35% 1,01%
United Kingdom 13,34% 21,40% 26,52% 16,34% 7,68% 4,03% 2,45% 1,65% 1,05% 1,72% 0,43% 3,40%
Extract from the Analysis
Conclusion: Player have a preference for weapons used by their country’s armed forces!
Big Data Router for Real-Time Analytics
Conclusion
• Sufficient reporOng speed to handle high velocity data flows • Fast enough to perform analysis in real-‐Ome on-‐the-‐fly
BioCAM Web Service
Big Data Router for Real-Time Analytics
BioCAM Web Service
• Core BioCAM AnalyOcs Engine • Duda Web Services Framework (h?p://duda.io) • Monkey Web Server (h?p://monkey-‐project.com) • HTTP(S)/JSON Web Service Interface • Create mulOple BioCAM instances with different schemes • Arbitrarily deep break downs for various kinds of analysis • Each break down serves mulOple aggregates • Drill-‐downs naOvely supported from the Web Service API
Duda
BioCAM
Monkey
HTTP/JSON
Big Data Router for Real-Time Analytics
RTDS (Real-‐Time Data Storage)
• NoSQL graph database to persistently store generic interconnected objects in an applicaOon
• Linked directly into the applicaOon to store its state
• Designed for telecom requirements • 24/7 always low latency (no maintenance windows!), 1+1 mirroring, fast switchover and failover, upgrades in runOme
• Side-‐effect: low overhead and energy efficient
Duda
BioCAM
Monkey
HTTP/JSON
RTDS
Big Data Router for Real-Time Analytics
Real-‐Time Data Storage (RTDS)
• Persistent NoSQL graph database • Stores generic interconnected objects in an applicaOon
• Linked directly into the applicaOon to store its state • Low overhead • Energy efficient
Duda
BioCAM
Monkey
HTTP/JSON
RTDS
Big Data Router for Real-Time Analytics
Real-‐Time Data Storage cont.
• Designed for telecom requirements • 24/7 always low latency • No maintenance windows • 1+1 mirroring • Fast switchover and failover • Upgrades in runOme
Duda
BioCAM
Monkey
HTTP/JSON
RTDS
Big Data Router for Real-Time Analytics
RTDS – Internal Workings
• Data is stored as a transacOon log • Proven method, provides atomic transacOons, audit history and correctly ordered updates in hot standby instance
• Robust in crash scenarios (corrupOon in end of log only) • Self-‐rotaOng transacOon log • No checkpoinOng (as it introduces latency and peaks in CPU/RAM resources)
• Background object traversal of all objects, writes latest state to log, when complete log is rotated
• ~1% of CPU, no latency peaks, no resource peaks, only last two logs required for restoring complete state
Duda
BioCAM
Monkey
HTTP/JSON
RTDS
Big Data Router for Real-Time Analytics
Real-‐Time Data Storage cont.
• Default operaOon: asynch without locks • Lock-‐free algorithms to get and commit transacOon buffers
• Background threads for log flushing and mirroring
• Avoids latency and priority inversions • Locks will be engaged in overload situaOons • Overhead: one RAM copy per object • For background traversal, verify state consistency etc
Duda
BioCAM
Monkey
HTTP/JSON
RTDS
Big Data Router for Real-Time Analytics
Three companies, one binary!
RTDS
Duda
BioCAM
Monkey Monkey Sooware Company
Oricane AB
Xarepo AB
Big Data Router for Real-Time Analytics
BioCAM – Internal Representa&on
• Records consists of value fields and class fields • Value fields are typically numbers (price, quanOty, temperature etc.) • Three types of class fields
• Explicit: color, brand, country etc. • Implicit: Omestamp falling within hour, week, month etc. • SyntheSc: favourite assault rifle
• Class field values are mapped to unsigned integers • Master key built by packing class fields into a large unsigned integer
Class field 1 Class field 2 Class field 3 Class field 4 Class field 5
Big Data Router for Real-Time Analytics
Breakdown
• MulO-‐branch tree structure • Each level corresponds to a unique class field • Not all class fields need to be present • Branches corresponds to class field values • The branches (field values) traversed from root to leaf is called a path • Records matching a path are recorded in the corresponding leaf
Big Data Router for Real-Time Analytics
Breakdown Construc&on
• For each record a handle is created • Each handle contain a reference to the record and a slave key • The slave key is an integer representaOon of path where field values from higher levels are stored in more significant bits
• Array of handles is sorted by increasing slave keys • Implicit tree structure is built bo?om up from the sorted array
ComputaOonal complexity dominated by sorOng!
Big Data Router for Real-Time Analytics
Aggregates
• Zero or more aggregates are associated with each breakdown • Aggregate values are associated with breakdown nodes and leaves • Aggregate funcSons are associated with breakdown levels • Leaf aggregate values are computed from value fields in the records using the leaf aggregate funcOon
• Node aggregate values are computed from childrens aggregate values using the node aggregate funcion
• Typically only one value field in records is considered • Typically aggregate funcOons are idenOcal between levels
Big Data Router for Real-Time Analytics
Example
Country: Sweden (S), Finland (F), Denmark (D), Norway (N) Brand: Audi (A), Ford (F), Volvo (V) Color: White (W), Red (R), Blue (B) Breakdown: Brand, Color, Country Aggregate: Sales
Big Data Router for Real-Time Analytics
Example
A
W R B
D F N S D F N S D F N S
F
W R B
D F N S D F N S D F N S
V
W R B
D F N S D F N S D F N S
Brand
Color
Country
Audi White Finland
Big Data Router for Real-Time Analytics
Tradi&onal Analy&cs in Retail
1. E-‐receipts sent to Data Warehouse 2. Analysis of new and historical data 3. Infrequent reports (once per week etc.)
Data not relevant to ”what’s happening now” involved in the analysis
1 2
3
Big Data Router for Real-Time Analytics
Real-‐&me On-‐the-‐fly Analy&cs in Retail
1. E-‐receipts sent to Data Warehouse 2. E-‐receipts intercepted/sent in real-‐Ome to
BioCAM WS 3. Analysis performed on-‐the-‐fly 4. ReporOng in real-‐Ome
Real-‐Ome monitoring, analysis and reporOng with minimum stress on the data warewouse
1
4
BioCAM Web Service
2
3
Big Data Router for Real-Time Analytics
Whatever Mart, Inc. The Mul& Tera Dollar Retail Corpora&on • 1.500 stores distributed across the globe open 10.00-‐18.00 • 15.000 unique products when taking size, color etc. into account • Customer purchases an average of 30 random products in each open store every second
• At peak rate 2.300 customers purchase 45.000 products per second thus surpassing 500.000 USD per second net sales
• E-‐receipts are reported immediately to BioCAM Web Service • Five different analyses are performed every ten seconds • Reports are presented on a dashboard and updated in real-‐Ome
Big Data Router for Real-Time Analytics
Whatever Mart, Inc. The Mul& Tera Dollar Retail Corpora&on
Almost 1000 billion transacOons since launch
whatever.oricane.com
Big Data Router for Real-Time Analytics
Benchmarks
ConfiguraOons: • Web Service – Access via Web Service front-‐end • Direct access – Test program linked with BioCAM, access via C API • Stripped – Direct access to BioCAM stripped from RTDS • Four different data bases sizes (number of records) • Six different transacOons loads (records updates per second)
Big Data Router for Real-Time Analytics
Aggregate Value Re-‐calcula&on Time
• 2500-‐3000 record transacOons per second • Re-‐calculaOon speed not dependent on transacOons/second • Measured in milliseconds
Web Service Direct Access Stripped 35 31 29
167 153 133 804 711 650 1580 1429 1302
Big Data Router for Real-Time Analytics
Transac&on Time
Web Service Direct Access Stripped Load (x/s) Time (us) Load (x/s) Time (us) Load (x/s) Time (us)
454 1201 407 183 483 144 1463 1824 1246 161 1464 125 2510 2684 2036 143 2275 118 2930 3064 2408 132 2772 109 4568 32150 3414 128 4107 100 5975 235471 4583 120 5742 91
Big Data Router for Real-Time Analytics
Direct Access
Big Data Router for Real-Time Analytics
Stripped
Big Data Router for Real-Time Analytics
Conclusion
• Aggregate value re-‐calculaOon cost linear in data base size is expected since the opOmized re-‐calculaOon scheme is not yet implemented
• TransacOon cost completely dominated by Web Service front-‐end especially at higher load
• Would be interesOng to bi-‐pass the web server and run JSON over IP • TransacOon cost for Direct Access and stripped decreases with higher load most likely due to reduced context switching and higher cache locality
Big Data Router for Real-Time Analytics
Key Applica&on Area: Gaming • Counter Strike Global Offensive (CSGO) Real-‐Ome StaOsOcs Site to be launched
• Currently 150 000 players on-‐line simultaneously
• Player base grows exponenOally
• Partnership with World #1 CSGO team Ninjas in Pyjamas (www.nip.gl)
Image source: h?p://www.pcgamer.com/valve-‐explains-‐how-‐csgo-‐became-‐the-‐second-‐most-‐played-‐game-‐on-‐steam/
Big Data Router for Real-Time Analytics
Key Applica&on Area: Energy • Oricane is involved in Cloudberry Datacenters (h?p://www.cloudberry-‐datacenters.com)
• Focus is on energy savings in data centers -‐ discussions are slow… • Oricane want to address:
• Energy producOon • Energy trading • Embedded applicaOons
• Looking for a fast paced key partner with lots of data to process • Pilot project -‐ value creaOon from ultra high analyOcs performance