Upload
giuseppe-gaviani
View
283
Download
1
Embed Size (px)
Citation preview
SNOWPLOW - LONDON MEETUP #4
BUSINESSES ARE CONSTANTLY EVOLVING…
▸ Your products (apps & platforms) change
▸ Your questions should change too
▸ It’s critical that the analytics stack can evolve with your business
SNOWPLOW - LONDON MEETUP #4
SELF-DESCRIBING DATA EVENT DATA MODELING+
EVOLVING EVENT DATA PIPELINE
HOW?
SNOWPLOW - LONDON MEETUP #4
DEFINE YOUR OWN EVENTS AND ENTITIES
Events
Entities
‣ Build castle
‣ Form alliance
‣ Declare war
‣ Player
‣ Game
‣ Level
‣ Castle
‣ View product
‣ Buy product
‣ Deliver product
‣ Product
‣ Customer
‣ Basket
‣ Vehicle
"description": "Schema for a fighter context",
"vendor": "com.ufc",
"name": “fighter",
"version": “1-0-2“,
"properties": {
"FirstName": {"type": "string"},
"LastName": {"type": "string"},
"Nickname": {"type": "string"},
"FacebookProfile": {"type": "string"},
"WeightLbs": {"type": ["integer", "null"]},
"Record": {"type": “string", "pattern": "^[0-9]+-[0-9]+-[0-9]+$"}
}
}
SNOWPLOW - LONDON MEETUP #4
YOU THEN DEFINE A SCHEMA FOR EACH EVENT AND ENTITY
I DON’T DO EVENTS THAT AREN’T SCHEMA’ED
SNOWPLOW - LONDON MEETUP #4
YOU THEN DEFINE A SCHEMA FOR EACH EVENT AND ENTITY
"schema": "iglu:ufc/fighter/jsonschema/1-0-2",
"data": {
"FirstName": “Daniel”
"LastName": “Cormier”,
"Nickname": “DC”,
"FacebookProfile": “Daniel-Cormier”,
"TwitterName": “dc_mma”,
"WeightLbs": 205
}
}
SNOWPLOW - LONDON MEETUP #4
THE SCHEMAS CAN THEN BE USED IN A NUMBER OF WAYS
▸ Validate the data (important for data quality)
▸ Load the data into tidy tables in your data warehouse
▸ Make it easy / safe to write downstream data processing application (e.g. for real-time users)
SNOWPLOW - LONDON MEETUP #4
WHAT IS EVENT DATA MODELING?
▸ Event data modeling is the process of using business logic to aggregate over event-level data to produce 'modeled' data that is simpler for querying.
SNOWPLOW - LONDON MEETUP #4
MODELED VS UNMODELED DATA
event 1
event n
…
Users
Sessions
…
Funnels
IMMUTABLE. UNOPINIATED. HARD TO CONSUME. NOT
MUTABLE AND OPINIONATED. EASY TO CONSUME.
SNOWPLOW - LONDON MEETUP #4
IN GENERAL, EVENT DATA MODELING IS PERFORMED ON THE COMPLETE EVENT STREAM
▸ Late arriving events can change the way you understand earlier arriving events
▸ If we change our data models: this gives us the flexibility to recompute historical data based on the new model
SNOWPLOW - LONDON MEETUP #4
HOW DO WE HANDLE PIPELINE EVOLUTION?
▸ Businesses change over time
▸ The events that occur are going to change
▸ Use of the data will change
▸ Insight -> more questions -> more insight -> more questions
▸ Two types of evolution: push and pull
BUSINESSES ARE NOT STATIC, SO EVENT PIPELINES SHOULD NOT BE EITHER
SNOWPLOW - LONDON MEETUP #4
PUSH EXAMPLE:
▸ If data is self-describing it is easy to add an additional sources
▸ Self-describing data is good for managing bad data and pipeline evolution
I’M AN EMAIL SEND
EVENT AND I HAVE INFORMATION ABOUT THE
RECIPIENT (EMAIL
SNOWPLOW - LONDON MEETUP #4
ANSWERING THE QUESTION:
1. EXISTING DATA MODEL SUPPORTS ANSWER
2. NEED TO UPDATE DATA MODEL
3. NEED TO UPDATE DATA MODEL AND DATA COLLECTION
SNOWPLOW - LONDON MEETUP #4
SELF-DESCRIBING DATA AND THE ABILITY TO RECOMPUTE DATA MODELS ARE ESSENTIAL TO ENABLE PIPELINE EVOLUTION
SELF-DESCRIBING DATA RECOMPUTE DATA MODELS ON ENTIRE DATA SET
‣ Updating existing events and entities in a backward compatible way e.g. add optional new fields
‣ Update existing events and entities in a backwards incompatible way e.g. change field types, remove fields, add compulsory fields
‣ Add new event and entity types
‣ Add new columns to existing derived tables e.g. add new audience segmentation
‣ Change the way existing derived tables are generated e.g. change sessionization logic
‣ Create new derived tables