14
Big Data Analytics at Play a Social Gaming industry perspective at Zynga

BigDataCloud Sept 8 2011 meetup - Big Data Analytics at Play (Social Gaming) by Tim Piatenko

Embed Size (px)

Citation preview

Page 1: BigDataCloud Sept 8 2011 meetup - Big Data Analytics at Play (Social Gaming) by Tim Piatenko

Big Data Analytics at Play

a Social Gaming industry perspective at Zynga

Page 2: BigDataCloud Sept 8 2011 meetup - Big Data Analytics at Play (Social Gaming) by Tim Piatenko

Before we begin

Why does a(n online) company need analytics?

•To monitor its operations (data)•is the site/app online and functional?•is data flowing?•do we get alerted when something breaks?

•To monitor its business (information)•are top line metrics looking healthy?•are we on target for this week/month/quarter?

•To understand its business (knowledge)•how are metrics related?•what drives changes?

•To use knowledge strategically (insight)

Page 3: BigDataCloud Sept 8 2011 meetup - Big Data Analytics at Play (Social Gaming) by Tim Piatenko

So what about Zynga?•Monitoring need is the same as everyone else's +

•It's an app within an app (FB) within a browser•more places for things to break

•It's a huge operational challenge to keep everything running, when millions are playing•It's a content push model with (really) fast release cycle•Collecting all the data and keeping it flowing internally is also a huge challenge

•So all of that makes it imperative to stay on top of things 24/7

Page 4: BigDataCloud Sept 8 2011 meetup - Big Data Analytics at Play (Social Gaming) by Tim Piatenko

That's operations, but what about the business?

•Content driven means you have to monitor business metrics all the time as well!

•Best to have overlap with operational metrics•use raw counts for things like visits

•But also need calculated metrics with trends•engagement, retention, virality, reach

•Need a system that can handle this real and near real time

•Need human beings to run the system and use the data

Page 5: BigDataCloud Sept 8 2011 meetup - Big Data Analytics at Play (Social Gaming) by Tim Piatenko

Zynga's approach

•Robust, simple real-time system (memcache, MySQL)

•Robust, sophisticated, and scalable data warehousing solution (Vertica)

•In-house developed reporting platform•also includes easy to use A/B testing

•A rather large team of engineers and analysts•software tools and DB developers and admins•reporting analysts embedded in game studios•central analysts working with marketing etc.•a research team for deeper understanding

Page 6: BigDataCloud Sept 8 2011 meetup - Big Data Analytics at Play (Social Gaming) by Tim Piatenko

Real-time analytics (monitoring)

•Meant for quickly pushing raw data into a simple database without any calculations

•The point is to know when something is broken as soon as possible

•This is not a system for answers, it's a system for alerts!

•Throw a chart up on a monitor and watch it every few minutes

Page 7: BigDataCloud Sept 8 2011 meetup - Big Data Analytics at Play (Social Gaming) by Tim Piatenko

The big guns — Vertica!

•90+% of analytics happens here

•near real-time processed data•remove duplicates and such

•nighly aggregated data = warehouse

•Column storage ideal for huge datasets, where most work is performed on aggregated data

•Is scaling very nicely to large clusters

•Has very sophisticated SQL extensions

•Does have its quirks as well...

Page 8: BigDataCloud Sept 8 2011 meetup - Big Data Analytics at Play (Social Gaming) by Tim Piatenko

Why Vertica? Why not Hadoop?•Speed: often want to know things in near real time, not wait for a big map/reduce job to come back

•Synergy with the company: good to be the biggest client of a surging business. Our success is your success!

•Easier to find good (business) analysts with great SQL background, while map/reduce is often the domain of engineers and academics

•In the end, for practical rather than religious reasons :)

Page 9: BigDataCloud Sept 8 2011 meetup - Big Data Analytics at Play (Social Gaming) by Tim Piatenko

Data Warehouse(s)

•Production cluster runs the reporting and A/B testing platforms

•Mirror cluster for ad hoc analysis and deep dives

•1% sample cluster for order of magnitude calculations and games like Cityville and Farmville with too much data :)

•not really useful for virals...

•Given the number of people accessing data and the amount of data recorded, very important to understand the limitations!

Page 10: BigDataCloud Sept 8 2011 meetup - Big Data Analytics at Play (Social Gaming) by Tim Piatenko

How big is Big?

•Let's say a game has 10M DAU, some come multiple times

•Even a very short session will have 10s of recorded activities

•game load tracking, assets loading, game state, clicks

•And then there are virals FB feed posts and requests

•So all in all, 10s of billions of rows, several terabytes a day

•not unusual to pull a dataset of 1B rows•not something you dump into Excel :)

Page 11: BigDataCloud Sept 8 2011 meetup - Big Data Analytics at Play (Social Gaming) by Tim Piatenko

In-house analytics

•Scale and data specifics make it hard to find canned solutions

•Want the ability to dig to arbitrary depth

•Want the ability to combine arbitrary data ad hoc

•Want to cater to a studio's specific needs

•Want to create a simple, scalable, usable system to:

•minimize data sources that need reconciliation•minimize operational points of failure•minimize the number of steps involved in analysis

Page 12: BigDataCloud Sept 8 2011 meetup - Big Data Analytics at Play (Social Gaming) by Tim Piatenko

In-house analytics continued

•Need a balance of self-service and analyst support

•Simple reporting web portal with SQL queries wrapped in XML + basic Fusion Charts visualizations

•created, maintained, and used by reporting analysts•available to everyone 24/7•everyone is looking at the same data!

•Analysts embedded directly into individual studios•"on the ground" understanding of each game•part of the fabric of the studio•yet leveraging the support of the wider analytics org

•Analysts in direct contact with infrastructure•solid understanding of the data flow + business needs

Page 13: BigDataCloud Sept 8 2011 meetup - Big Data Analytics at Play (Social Gaming) by Tim Piatenko

Fine, so what is it all used for?

•Dashboards and reports

•MAU/WAU/DAU, user acquisition, daily/weekly retention, lapse/death, player engagement, virality, k-factor, levels, game actions, and of course revenues

•Distributions, trends, funnels, segmentation

•Combining metrics, understanding feature performance, user behavior, revenue successes and failures

•Adjusting quickly, learning from mistakes

•Deploying successes widely, planning ahead

Page 14: BigDataCloud Sept 8 2011 meetup - Big Data Analytics at Play (Social Gaming) by Tim Piatenko

Role of an analyst

•PMs can•track metrics for games/features•pull various reports when something is off•run "simple" ad hoc queries•create and run A/B tests

•But analysts can•bridge business and infrastructure•dig deeper into the data•combine huge datasets efficiently•apply their intuitive "feel" for big data•leverage each other's work