Snowplow - Evolve your analytics stack with your business

  • Published on

  • View

  • Download

Embed Size (px)


<ul><li><p>Snowplow: evolve your analytics stack with your </p><p>business</p><p>Snowplow Meetup Berlin Nov 2016</p></li><li><p>Our businesses are constantly evolving</p><p> Our digital products (apps and platforms) are constantly developing </p><p> The questions we ask of our data are constantly changing </p><p> It is critical that the analytics stack can evolve with your business</p></li><li><p>Self-describing data Event data modeling+</p><p>Event data pipeline that evolves with your business</p><p>How Snowplow users evolve their analytics stacks with their business</p></li><li><p>Self-describing dataOverview</p></li><li><p>Event data varies widely by company</p></li><li><p>As a Snowplow user, you can define your own events and entities</p><p>Events</p><p>Entities (contexts)</p><p> Build castle Form alliance Declare war</p><p> Player Game Level Currency</p><p> View product Buy product Deliver product</p><p> Product Customer Basket Delivery van</p></li><li><p>You then define a schema for each event and entity</p><p>{ "$schema": "", "description": "Schema for a fighter context", "self": { "vendor": "com.ufc", "name": "fighter_context", "format": "jsonschema", "version": "1-0-1" }, </p><p> "type": "object", "properties": { "FirstName": { "type": "string" }, "LastName": { "type": "string" }, "Nickname": { "type": "string" }, "FacebookProfile": { "type": "string" }, "TwitterName": { "type": "string" }, "GooglePlusProfile": { "type": "string" }, </p><p>"HeightFormat": { "type": "string" }, "HeightCm": { "type": ["integer", "null"] }, "Weight": { "type": ["integer", "null"] }, "WeightKg": { "type": ["integer", "null"] }, "Record": { "type": "string", "pattern": "^[0-9]+-[0-9]+-[0-9]+$" }, "Striking": { "type": ["number", "null"], "maxdecimal": 15 }, "Takedowns": { "type": ["number", "null"], "maxdecimal": 15 }, "Submissions": { "type": ["number", "null"], "maxdecimal": 15 }, "LastFightUrl": { "type": "string" },</p><p>"LastFightEventText": { "type": "string" }, "NextFightUrl": { "type": "string" }, "NextFightEventText": { "type": "string" }, "LastFightDate": { "type": "string", "format": "timestamp" } }, "additionalProperties": false }</p><p>Upload the schema to Iglu</p></li><li><p>Then send data into Snowplow as self-describing JSONs</p><p>1. Validation 2. Dimension widening3. Data </p><p>modeling</p><p>{ schema: iglu:com.israel365/temperature_measure/jsonschema/1-0-0, data: { timestamp: 2016-11-16 19:53:21, location: Berlin, temperature: 3 units: Centigrade } }</p><p>{ "$schema": "", "description": "Schema for an ad impression event", "self": { "vendor": com.israel365", "name": temperature_measure", "format": "jsonschema", "version": "1-0-0" }, "type": "object", </p><p> "properties": { "timestamp": { "type": "string" }, "location": { "type": "string" }, }, }</p><p>Event</p><p>Schema reference</p><p>Schema</p></li><li><p>The schemas can then be used in a number of ways</p><p> Validate the data (important for data quality) </p><p> Load the data into tidy tables in your data warehouse </p><p> Make it easy / safe to write downstream data processing application (e.g. for real-time users)</p></li><li><p>Event data modelingOverview</p></li><li><p>What is event data modeling?</p><p>1. Validation 2. Dimension widening3. Data </p><p>modeling</p><p>Event data modeling is the process of using business logic to aggregate over event-level data to produce 'modeled' data that is simpler for querying.</p></li><li><p>event 1</p><p>event n</p><p>Users</p><p>Sessions</p><p>Funnels</p><p>Immutable. Unopiniated. Hard to consume. Not contentious</p><p>Mutable and opinionated. Easy to consume. May </p><p>be contentious</p><p>Unmodeled data Modeled data</p></li><li><p>In general, event data modeling is performed on the complete event stream</p><p> Late arriving events can change the way you understand earlier arriving events </p><p> If we change our data models: this gives us the flexibility to recompute historical data based on the new model</p></li><li><p>The evolving event data pipeline</p></li><li><p>How do we handle pipeline evolution?</p><p>PUSH FACTORS: </p><p>What is being tracked will change over </p><p>time</p><p>PULL FACTORS:</p><p>What questions are being asked of the data will change </p><p>over time</p><p>Businesses are not static, so event pipelines should not be either</p><p>Web</p><p>Apps</p><p>Servers</p><p>Comms channels</p><p>Push </p><p>Data warehouse</p><p>Data exploration</p><p>Predictive modeling</p><p>Real-time dashboards</p><p>Real-time, data-driven applicationsRT </p><p>Bidder Voucher</p><p>Person-alization </p><p>Collection Processing</p><p>Smart car / home</p></li><li><p>Push example: new source of event data</p><p> If data is self-describing it is easy to add an additional sources </p><p> Self-describing data is good for managing bad data and pipeline evolution </p><p>Im an email send event and I have information about the recipient (email address, customer ID) and the email </p><p>(id, tags, variation)</p></li><li><p>Pull example: new business question</p><p>Answer</p><p>Insight</p><p>Question?</p></li><li><p>Answering the question: 3 possibilities</p><p>1. Existing data model supports answer</p><p>2. Need to update data model</p><p>3. Need to update data model and data </p><p>collection</p><p> Possible to answer question with existing modeled data</p><p> Data collected already supports answer </p><p> Additional computation required in data modeling step (additional logic)</p><p> Need to extend event tracking </p><p> Need to update data models to incorporate additional data (and potentially additional logic)</p></li><li><p>Self-describing data and the ability to recompute data models are essential to enable pipeline evolution</p><p>Self-describing data Recompute data models on entire data set</p><p> Updating existing events and entities in a backward compatible way e.g. add optional new fields </p><p> Update existing events and entities in a backwards incompatible way e.g. change field types, remove fields, add compulsory fields </p><p> Add new event and entity types</p><p> Add new columns to existing derived tables e.g. add new audience segmentation </p><p> Change the way existing derived tables are generated e.g. change sessionization logic </p><p> Create new derived tables</p></li><li><p>Questions?</p></li></ul>