19
EVOLVING YOUR ANALYTICS STACK WITH YOUR BUSINESS SNOWPLOW - LONDON MEETUP #4

How to evolve your analytics stack with your business using Snowplow

Embed Size (px)

Citation preview

EVOLVING YOUR ANALYTICS STACK WITH YOUR BUSINESS

SNOWPLOW - LONDON MEETUP #4

SNOWPLOW - LONDON MEETUP #4

BUSINESSES ARE CONSTANTLY EVOLVING…

▸ Your products (apps & platforms) change

▸ Your questions should change too

▸ It’s critical that the analytics stack can evolve with your business

SNOWPLOW - LONDON MEETUP #4

SELF-DESCRIBING DATA EVENT DATA MODELING+

EVOLVING EVENT DATA PIPELINE

HOW?

SELF-DESCRIBING DATAPART 1

SNOWPLOW - LONDON MEETUP #4

NO TWO COMPANIES ARE ALIKE

SNOWPLOW - LONDON MEETUP #4

DEFINE YOUR OWN EVENTS AND ENTITIES

Events

Entities

‣ Build castle

‣ Form alliance

‣ Declare war

‣ Player

‣ Game

‣ Level

‣ Castle

‣ View product

‣ Buy product

‣ Deliver product

‣ Product

‣ Customer

‣ Basket

‣ Vehicle

"description": "Schema for a fighter context",

"vendor": "com.ufc",

"name": “fighter",

"version": “1-0-2“,

"properties": {

"FirstName": {"type": "string"},

"LastName": {"type": "string"},

"Nickname": {"type": "string"},

"FacebookProfile": {"type": "string"},

"WeightLbs": {"type": ["integer", "null"]},

"Record": {"type": “string", "pattern": "^[0-9]+-[0-9]+-[0-9]+$"}

}

}

SNOWPLOW - LONDON MEETUP #4

YOU THEN DEFINE A SCHEMA FOR EACH EVENT AND ENTITY

I DON’T DO EVENTS THAT AREN’T SCHEMA’ED

SNOWPLOW - LONDON MEETUP #4

YOU THEN DEFINE A SCHEMA FOR EACH EVENT AND ENTITY

"schema": "iglu:ufc/fighter/jsonschema/1-0-2",

"data": {

"FirstName": “Daniel”

"LastName": “Cormier”,

"Nickname": “DC”,

"FacebookProfile": “Daniel-Cormier”,

"TwitterName": “dc_mma”,

"WeightLbs": 205

}

}

SNOWPLOW - LONDON MEETUP #4

THE SCHEMAS CAN THEN BE USED IN A NUMBER OF WAYS

▸ Validate the data (important for data quality)

▸ Load the data into tidy tables in your data warehouse

▸ Make it easy / safe to write downstream data processing application (e.g. for real-time users)

EVENT DATA MODELINGPART 2

SNOWPLOW - LONDON MEETUP #4

WHAT IS EVENT DATA MODELING?

▸ Event data modeling is the process of using business logic to aggregate over event-level data to produce 'modeled' data that is simpler for querying.

SNOWPLOW - LONDON MEETUP #4

MODELED VS UNMODELED DATA

event 1

event n

Users

Sessions

Funnels

IMMUTABLE. UNOPINIATED. HARD TO CONSUME. NOT

MUTABLE AND OPINIONATED. EASY TO CONSUME.

SNOWPLOW - LONDON MEETUP #4

IN GENERAL, EVENT DATA MODELING IS PERFORMED ON THE COMPLETE EVENT STREAM

▸ Late arriving events can change the way you understand earlier arriving events

▸ If we change our data models: this gives us the flexibility to recompute historical data based on the new model

EVOLVING THE DATA PIPELINEPART 3

SNOWPLOW - LONDON MEETUP #4

HOW DO WE HANDLE PIPELINE EVOLUTION?

▸ Businesses change over time

▸ The events that occur are going to change

▸ Use of the data will change

▸ Insight -> more questions -> more insight -> more questions

▸ Two types of evolution: push and pull

BUSINESSES ARE NOT STATIC, SO EVENT PIPELINES SHOULD NOT BE EITHER

SNOWPLOW - LONDON MEETUP #4

PUSH EXAMPLE:

▸ If data is self-describing it is easy to add an additional sources

▸ Self-describing data is good for managing bad data and pipeline evolution

I’M AN EMAIL SEND

EVENT AND I HAVE INFORMATION ABOUT THE

RECIPIENT (EMAIL

SNOWPLOW - LONDON MEETUP #4

ANSWERING THE QUESTION:

1. EXISTING DATA MODEL SUPPORTS ANSWER

2. NEED TO UPDATE DATA MODEL

3. NEED TO UPDATE DATA MODEL AND DATA COLLECTION

SNOWPLOW - LONDON MEETUP #4

SELF-DESCRIBING DATA AND THE ABILITY TO RECOMPUTE DATA MODELS ARE ESSENTIAL TO ENABLE PIPELINE EVOLUTION

SELF-DESCRIBING DATA RECOMPUTE DATA MODELS ON ENTIRE DATA SET

‣ Updating existing events and entities in a backward compatible way e.g. add optional new fields

‣ Update existing events and entities in a backwards incompatible way e.g. change field types, remove fields, add compulsory fields

‣ Add new event and entity types

‣ Add new columns to existing derived tables e.g. add new audience segmentation

‣ Change the way existing derived tables are generated e.g. change sessionization logic

‣ Create new derived tables

QUESTIONS?

SNOWPLOW - LONDON MEETUP #4