Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
gamedesigninitiativeat cornell university
the
Data Collection
Lecture 16
gamedesigninitiativeat cornell university
the
The Rise of Big Data
� Big data is changing game design � Can gather data form a huge number of players � Can use that data to inform future content
� What can we do with all that data? � What types of questions can we answer? � How does it affect our business model?
� How do we collect all of this data? � What are the technical challenges? � What are the legal/ethical challenges?
Data Collection 2
gamedesigninitiativeat cornell university
the
The Rise of Big Data
� Big data is changing game design � Can gather data form a huge number of players � Can use that data to inform future content
� What can we do with all that data? � What types of questions can we answer? � How does it affect our business model?
� How do we collect all of this data? � What are the technical challenges? � What are the legal/ethical challenges?
Data Collection 3
gamedesigninitiativeat cornell university
the
Data Collection 4
Logging Game Data
Query 2 Log
Data Store
gamedesigninitiativeat cornell university
the
Issues with Data Collection
� You have to record the data � Capture data from the game state � Do it so it does not hurt performance
� You have to store the data somewhere � Lots of players means a lot of data � Use database or Google-style storage solution
� You have to move the data to the datastore � Requires an internet connection (while playing?)
Data Collection 5
gamedesigninitiativeat cornell university
the
Single Player Game
Data Collection 6
Instrumenting Activity
Online Game
Server
Client
LOG? LOG?
gamedesigninitiativeat cornell university
the
Single Player Game
Data Collection 7
Instrumenting Activity
Online Game
Server
Client
LOG? LOG?
You Need to Access the Data Log
gamedesigninitiativeat cornell university
the
Single Player Game
Data Collection 8
Instrumenting Activity
Online Game
Server
Client
LOG? LOG?
gamedesigninitiativeat cornell university
the
Single Player Game
Data Collection 9
Instrumenting Activity
Online Game
Server
Client
LOG? LOG?
Battery &
Connectivity Server Load
gamedesigninitiativeat cornell university
the
Single Player Game
Data Collection 10
Instrumenting Activity
Online Game
Server
Client
LOG LOG
gamedesigninitiativeat cornell university
the
� Disks (for logs) are slow � SDD: 500 MB/s
� Standard Disk: 200 MB/s
� Or about 8 MB/frame
� Memory (for state) is fast � RAM: ~15000 MB/s
� Or about 250 MB/frame
� Cannot write fast enough! � Asynchronous logging?
Data Collection 11
Even This is Not Enough
Update
Draw
State
Log
gamedesigninitiativeat cornell university
the
Solution: Limit What you Collect
� Way too much data to collect everything � Example: 500 objects with 10 integer size fields
� 20 KB/frame or 1.2 MB/s or 72 MB/minute
� Console games have more objects, played longer
� What would you do if even if you could? � How do you search through all this data?
� Just saying “use a database” is not enough
Data Collection 12
gamedesigninitiativeat cornell university
the
� I/O has problems with jitter � Throughput is not constant � Might be in use elsewhere
� Update loop also has jitter � May finish before 16ms � “Sleeps” until next frame
� Remove jitter with a budget � Use time at end of update � Write as much as possible � Will keep up in the long run
Data Collection 13
Aside: Asynchronous Still Desirable
Update
Draw
Game State
Every Frame
Log State
Periodically (With Budget)
gamedesigninitiativeat cornell university
the
Performance
� Only collect what you need � Overcoming challenges
� Significant game events
� Unusual actions/interactions
� Instrumentation is complex � Identify the source of event
� Find this location in code
� Add log statement there
Data Collection 14
The Classic Tradeoff
Extensibility
� Who knows what I need? � Resource data can get large
� Player position might matter
� Eventually have full state
� How do I get more data? � Always log it all (no)
� Ask programmer to add more instrumentation(?)
gamedesigninitiativeat cornell university
the
How Do We Solve this Problem?
Data Collection 15
gamedesigninitiativeat cornell university
the
How Do We Process All this Data?
� How you query the data is more important � Example: SQL, Map Reduce, custom set-up
� This often determines data format for you.
� In general, use a uniform schema for the data � All data should have the same attributes
� Avoid key-value blobs; make attributes structured
� This means you could use SQL (even if you do not)
Data Collection 16
gamedesigninitiativeat cornell university
the
Is SQL Suitable for Analytics?
Data Collection 17
� Game state is (often) naturally relational � Objects are a long list of attributes � Store each game object as a row
� Lots of libraries can convert objects to SQL
key player x y health damage range flying
1 1 12 342 100 50 20 false
2 1 43 12 100 50 20 false
3 2 123 90 50 30 50 true
gamedesigninitiativeat cornell university
the
Or Maybe Not
Data Collection 18
gamedesigninitiativeat cornell university
the
Data Collection 19
Can Still Do This Relationally
But how easy is this to process?
player … inventory
1 … 10
2 … 11
3 … 12
Player Table item … value
20 … 250
21 … 1000
22 … 50
23 … 100
24 … 5000
Item Table player item qty
1 21 1
1 20 3
1 22 10
2 21 1
2 24 1
Join Table
gamedesigninitiativeat cornell university
the
Analytics Involve Time-Sensitive Data
Data Collection 20
� Each event/data needs to be time-stamped � Time it happened in seconds/milliseconds/etc � Typically use an int, as it makes grouping easier
� Time attribute is an integral part of queries
key player x y health damage range flying time
1 1 12 342 100 50 20 false 1209823
2 1 43 12 100 50 20 false 1309875
3 2 123 90 50 30 50 true 1724304
gamedesigninitiativeat cornell university
the
Example: Aggregating Daily Prices
� Average price of each item, each day � Want to group together item transactions per day
� Use SEC_PER_DAY to convert time stamp to days
� Relatively simple aggregate query
Data Collection 21
SELECT item, avg(price), time/SEC_PER_DAY FROM gamedata GROUPBY time/SEC_PER_DAY
gamedesigninitiativeat cornell university
the
Hard Example: Quest Order
� Return first quest completed after tutorial level � Want a quest that is completed after tutorial
� Want no other quest completed in between
� This is known as a time-series query
Data Collection 22
Tutorial Quest 1
Quest 2
gamedesigninitiativeat cornell university
the
Time Series in SQL
� Return all pairs of consecutive time events
Data Collection 23
SELECT D1.name, D2.name FROM Data as D1, Data as D2 WHERE D2.time > D1.time
EXCEPT
SELECT D1.name, D2.name FROM Data as D1, Data as D2, Data as S WHERE S.time > D1.time and D2.time > S.time
gamedesigninitiativeat cornell university
the
NoSQL is not Much Help
� Time series is a correlation of actions � Events X that happen after Y � Correlation requires a DB join
� NoSQL means Map Reduce � Map: Sort data into key, values � Reduce: Aggregate values of same keys � Example: Average price of an item
� Joins are hard in Map Reduce � Google gets scalability by “cheating”
Data Collection 24
gamedesigninitiativeat cornell university
the
� Given set of event streams � Database with time stamps � Can only read data in order � Can only read data once
� Idea: Get events as they occur � But this is a little unrealistic � Delay to create the event
� Also have a pattern query � Matches event/set of events � Outputs match as new event � Called a composite event
Data Collection 25
Alternative: Data Stream Systems
Event Stream
Event Stream
Event Stream
Event Pattern Query
Output also an Event Stream
gamedesigninitiativeat cornell university
the
External Stream System
Data Collection 26
Using Data Stream Systems
Internal Stream System
Event Stream
Pattern
Event Stream
Game State
Data Stream System
gamedesigninitiativeat cornell university
the
External Event System
� Game acts as base stream � Matching done outside � Simple but needs bandwidth
� Several commercial options � Streambase � Microsoft StreamInsight � Oracle CEP
� Less open source options � Lot of academic prototypes
Data Collection 27
Using Data Stream Systems
Internal Event System
� Game handles all matching � Event streams in game state � Bandwidth only for output
� You must implement system � Maybe use open source � But always read licenses!
� Why do this? Complex AI � Just like rule matching! � Bioshock Infinite does this
gamedesigninitiativeat cornell university
the
Did I Mention A LOT of Data?
� Amount of data might be too much for one log
� Just one player can create a significant amount
� Data probably in different categories
� Examples: Items sold, quests finished
� Put the data in appropriate databases
� Each might be on a different server
� Data streams can send data to the right place
Data Collection 28
gamedesigninitiativeat cornell university
the
Data Collection 29
Data Streams and Categorization
Log Data Stream System
Query 2
gamedesigninitiativeat cornell university
the
Classic Event Patterns
� Selection: P(e) (event e has property P) � Can be applied to base or composite events � Common property for composite events: duration
� Ordering: E1:E2 (E2 happened after E1)
� Sequences: E1;E2 (E2 is the first event after E1) � Ranges over all events fed into the pattern query � Sometimes want to exclude certain matches to E2 � E1 ;P E2 (E2 is first event after E1satisfying P)
Data Collection 30
gamedesigninitiativeat cornell university
the
Patterns and Base Streams
Data Collection 31
(a1,b1,c1,…,t1)
(a2,b2,c2,…,t2)
(a3,b3,c3,…,t3)
(a4,b4,c4,…,t4)
(a5,b5,c5,…,t5)
(a6,b6,c6,…,t6)
(a7,b7,c7,…,t7) …
New
er E
vent
s
Base Stream from Application
(a1,b1,c1,…,t1)
(a3,b3,c3,…,t3)
(a6,b6,c6,…,t6)
(a7,b7,c7,…,t7) …
New
er E
vent
s
Derived Stream “P”
P(x)
Filter Predicate
Boolean expression on the event tuple (easy to compute)
One for each predicate
gamedesigninitiativeat cornell university
the
Sequencing Streams
Data Collection 32
(a2,b2,…,t2)
(a4,b4,…,t4)
(a7,b7,…,t7) …
Input Stream E1
(a4,…,x4,…,t2,t5)
…
Derived Stream E1;E2
E1;E2
Event Pattern (x1,y1,…,t1)
(x5,y5,…,t5)
(x6,y6,…,t6)
(x7,y7,…,t7) …
Input Stream E2
(a2,...,x5,…,t2,t5)
gamedesigninitiativeat cornell university
the
Sequencing Streams
Data Collection 33
(a2,b2,…,t2)
(a4,b4,…,t4)
(a7,b7,…,t7) …
Input Stream E1
(a4,…,x4,…,t2,t5)
…
Derived Stream E1;E2
E1;E2
Event Pattern (x1,y1,…,t1)
(x5,y5,…,t5)
(x6,y6,…,t6)
(x7,y7,…,t7) …
Input Stream E2
(a2,...,x5,…,t2,t5)
Have to store until a match
is found
gamedesigninitiativeat cornell university
the
� All events must happen � Specific time it is generated � Crucial for time series
� Pattern output is an event! � New stream to be reused � Often pick latest time stamp
� But duration is important � Good candidate for P(x) � Allows us to timeout
� Solution: intervals
Data Collection 34
Some Words on Time Stamps
Base Events (B)
Composite Events (B;B)
e1 e2 e2 e4
e1; e2
e2; e3
e3; e4 Still Lossy. Why?
gamedesigninitiativeat cornell university
the
Ordering Streams
Data Collection 35
(a2,b2,…,t2)
(a4,b4,…,t4)
(a7,b7,…,t7) …
Input Stream E1
…
Derived Stream E1:E2
E1:E2
Event Pattern (x1,y1,…,t1)
(x5,y5,…,t5)
(x6,y6,…,t6)
(x7,y7,…,t7) …
Input Stream E2
(a2,…,t2,t5) (a4,…,t2,t5)
(a2,…,t2,t6) (a4,…,t2,t6)
(a2,…,t2,t7) (a4,…,t2,t7)
O(n2) blow-up.
Need timeouts!
gamedesigninitiativeat cornell university
the
� Analysis => lots of patterns � Each play action has pattern
� Was purpose of the system
� Time Series => data storage � Have to store the first part
� Drop at match (or timeout)
� Idea: Each pattern has a list � Stores uncompleted part
� Complex time series?
� Pattern 1: P
� Pattern 2: P;Q
� Pattern 3: R;QS
� Pattern 4: P;Q;R
Data Collection 36
Managing Patterns
Nothing to store
Have to store
Have to store
Must store Must store
gamedesigninitiativeat cornell university
the
Data Collection 37
Solution: Automata
Pattern: T<30s((P;QR):S)
P R
S
(a3,…,t3)
(a4,…,t4)
(a7,…,t7)
(a1,b2,…,t1,t2)
S True Q < 30s
< 30s
Each edge has: � Input event stream
� Test predicate
� (Concatenation op)
For each edge: 1. Take each tuple from out-state 2. Concatenate with stream tuple 3. Test if predicate is true 4. If so, move tuple to in-state
Nondeterminstic
gamedesigninitiativeat cornell university
the
Events and Data Driven-Development
� Designers should not program these automata � Essentially want a nice query language � There are some simple ones to implement
� May already have automata framework for AI � Allow designers to specify AI behavior in script file � Compile into automaton when necessary
� Your event system can piggy back on this � Similar system (except for the tuple storage) � But need a different (custom) compiler
Data Collection 38
gamedesigninitiativeat cornell university
the
The Main Challenge: Negation
� Common pattern: � Find a sequence A;B with no C in between � Example: Quest started/done with none in between
� Classic Idea: Regular expressions � A(; ¬C)*;B (Keep checking that not C until find B) � This is not quite right, however. Why? � Bigger issue: should not store “not” events
� Messy technical challenge; no clean solutions � Purpose of contexts in Irrational’s technical talk
Data Collection 39
gamedesigninitiativeat cornell university
the
Putting this All Together
� We need to record the data fast � Requires a custom logging system in game � Have to be choosy about what data is recorded � Need to be flexible to respond to designers
� We need to process the data fast � Depends on the choice of query languages � Need support for aggregation and time-series � Data stream systems are ideal for this
Data Collection 40
gamedesigninitiativeat cornell university
the
Brainstorm: Mobile Data Collection
Data Collection 41