Snowplow: where we came from and where we are going - March 2016

Preview:

Citation preview

Where we came from and where we’re goingMarch 2016

Snowplow was born in 2012

Web data: rich but GA / SiteCatalyst are limited

“Big data” tech

• Marketing, not product analytics

• Silo’d: can’t join with other customer data

Snowplow

• Open source frameworks

• Cloud services

• Open source click stream data warehouse

• Event level: any query

• Built on top of Cloudfront / EMR / Hadoop

The plan: spend 6 months building a pipeline…

…then get back to using the data

So what went wrong?

Increased project scope• Click stream data warehouse -> Event

analytics platform

• Collect events from anywhere, not just the web

• Make event data actionable in real-time

• Support more in-pipeline processing steps (enrichment and modeling)

• Support more storage targets (where your data is has big implications for what you can do with that data)

Track events from anywhere

• Events• Entities

Make event data actionable in real-time

• Personalization• Marketing automation

• Content analytics

Today, Snowplow is an event data pipeline

What makes Snowplow special?• Data pipeline evolves with your

business

• Channel coverage

• Flexibility: where your data is delivered

• Flexibility: how your data is processed (enrichment and modeling)

• Data quality

• Speed

• Transparency

Used by 100s (1000s?) of companies…

…to answer their most important business questions

But there’s still much more to build!

• Improve automation around schema evolution

• Make modeling event data easier, more robust, more performant

• Support more storage targets

• Make it easier to act on event data

Data modeling in Spark

Druid, BigQuery, graph databases

Analytics SDKs, Sauna

Iglu: machine-readable schema registry

Questions?

• Can take questions now or after the other talks

Recommended