20
Snowplow: evolve your analytics stack with your business Snowplow Meetup San Francisco, Feb 2017

Snowplow: evolve your analytics stack with your business

Embed Size (px)

Citation preview

Page 1: Snowplow: evolve your analytics stack with your business

Snowplow: evolve your analytics stack with your

business

Snowplow Meetup San Francisco, Feb 2017

Page 2: Snowplow: evolve your analytics stack with your business

Our businesses are constantly evolving…

• Our digital products (apps and platforms) are constantly developing

• The questions we ask of our data are constantly changing

• It is critical that our analytics stack can evolve with our business

Page 3: Snowplow: evolve your analytics stack with your business

Self-describing data Event data modeling+

Analytics stack that evolves with your business

How Snowplow users evolve their analytics stacks with their business

Page 4: Snowplow: evolve your analytics stack with your business

Self-describing dataOverview

Page 5: Snowplow: evolve your analytics stack with your business

Event data varies widely by company

Page 6: Snowplow: evolve your analytics stack with your business

As a Snowplow user, you can define your own events and entities

Events

Entities (contexts)

• Build castle • Form alliance • Declare war

• Player • Game • Level • Currency

• View product • Buy product • Deliver product

• Product • Customer • Basket • Delivery van

Page 7: Snowplow: evolve your analytics stack with your business

You then define a schema for each event and entity

{ "$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#", "description": "Schema for a fighter context", "self": { "vendor": "com.ufc", "name": "fighter_context", "format": "jsonschema", "version": "1-0-1" },

"type": "object", "properties": { "FirstName": { "type": "string" }, "LastName": { "type": "string" }, "Nickname": { "type": "string" }, "FacebookProfile": { "type": "string" }, "TwitterName": { "type": "string" }, "GooglePlusProfile": { "type": "string" },

"HeightFormat": { "type": "string" }, "HeightCm": { "type": ["integer", "null"] }, "Weight": { "type": ["integer", "null"] }, "WeightKg": { "type": ["integer", "null"] }, "Record": { "type": "string", "pattern": "^[0-9]+-[0-9]+-[0-9]+$" }, "Striking": { "type": ["number", "null"], "maxdecimal": 15 }, "Takedowns": { "type": ["number", "null"], "maxdecimal": 15 }, "Submissions": { "type": ["number", "null"], "maxdecimal": 15 }, "LastFightUrl": { "type": "string" },

"LastFightEventText": { "type": "string" }, "NextFightUrl": { "type": "string" }, "NextFightEventText": { "type": "string" }, "LastFightDate": { "type": "string", "format": "timestamp" } }, "additionalProperties": false }

Upload the schema to Iglu

Page 8: Snowplow: evolve your analytics stack with your business

Then send data into Snowplow as self-describing JSONs

1. Validation 2. Dimension widening

3. Data modeling

{ “schema”: “iglu:com.israel365/temperature_measure/jsonschema/1-0-0”, “data”: { “timestamp”: “2016-11-16 19:53:21”, “location”: “Berlin”, “temperature”: 3 “units”: “Centigrade” } }

{ "$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#", "description": "Schema for an ad impression event", "self": { "vendor": “com.israel365", "name": “temperature_measure", "format": "jsonschema", "version": "1-0-0" }, "type": "object",

"properties": { "timestamp": { "type": "string" }, "location": { "type": "string" }, … }, … }

Event

Schema reference

Schema

Page 9: Snowplow: evolve your analytics stack with your business

The schemas can then be used in a number of ways

• Validate the data (important for data quality)

• Load the data into tidy tables in your data warehouse

• Make it easy / safe to write downstream data processing application (e.g. for real-time users)

Page 10: Snowplow: evolve your analytics stack with your business

Event data modelingOverview

Page 11: Snowplow: evolve your analytics stack with your business

What is event data modeling?

1. Validation 2. Dimension widening

3. Data modeling

Event data modeling is the process of using business logic to aggregate over event-level data to produce 'modeled' data that is simpler for querying.

Page 12: Snowplow: evolve your analytics stack with your business

event 1

event n

Users

Sessions

Funnels

Immutable. Unopiniated. Hard to consume. Not contentious

Mutable and opinionated. Easy to consume. May

be contentious

Unmodeled data Modeled data

Page 13: Snowplow: evolve your analytics stack with your business

In general, event data modeling is performed on the complete event stream

• Late arriving events can change the way you understand earlier arriving events

• If we change our data models: this gives us the flexibility to recompute historical data based on the new model

Page 14: Snowplow: evolve your analytics stack with your business

The evolving event data pipeline

Page 15: Snowplow: evolve your analytics stack with your business

How do we handle pipeline evolution?

PUSH FACTORS:

What is being tracked will change over

time

PULL FACTORS:

What questions are being asked of the data will change

over time

Businesses are not static, so event pipelines should not be either

Web

Apps

Servers

Comms channels

Push …

Data warehouse

Data exploration

Predictive modeling

Real-time dashboards

Real-time, data-driven applicationsRT

Bidder Voucher

Person-alization …

Collection Processing

Smart car / home

Page 16: Snowplow: evolve your analytics stack with your business

Push example: new source of event data

• If data is self-describing it is easy to add an additional sources

• Self-describing data is good for managing bad data and pipeline evolution

I’m an email send event and I have information about the recipient (email address, customer ID) and the email

(id, tags, variation)

Page 17: Snowplow: evolve your analytics stack with your business

Pull example: new business question

Answer

Insight

Question?

Page 18: Snowplow: evolve your analytics stack with your business

Answering the question: 3 possibilities

1. Existing data model supports answer

2. Need to update data model

3. Need to update data model and data

collection

• Possible to answer question with existing modeled data

• Data collected already supports answer

• Additional computation required in data modeling step (additional logic)

• Need to extend event tracking

• Need to update data models to incorporate additional data (and potentially additional logic)

Page 19: Snowplow: evolve your analytics stack with your business

Self-describing data and the ability to recompute data models are essential to enable pipeline evolution

Self-describing data Recompute data models on entire data set

• Updating existing events and entities in a backward compatible way e.g. add optional new fields

• Update existing events and entities in a backwards incompatible way e.g. change field types, remove fields, add compulsory fields

• Add new event and entity types

• Add new columns to existing derived tables e.g. add new audience segmentation

• Change the way existing derived tables are generated e.g. change sessionization logic

• Create new derived tables

Page 20: Snowplow: evolve your analytics stack with your business

Questions?