Upload
yalisassoon
View
488
Download
0
Embed Size (px)
Citation preview
Where we came from and where we’re goingMarch 2016
Snowplow was born in 2012
Web data: rich but GA / SiteCatalyst are limited
“Big data” tech
• Marketing, not product analytics
• Silo’d: can’t join with other customer data
Snowplow
• Open source frameworks
• Cloud services
• Open source click stream data warehouse
• Event level: any query
• Built on top of Cloudfront / EMR / Hadoop
The plan: spend 6 months building a pipeline…
…then get back to using the data
So what went wrong?
Increased project scope• Click stream data warehouse -> Event
analytics platform
• Collect events from anywhere, not just the web
• Make event data actionable in real-time
• Support more in-pipeline processing steps (enrichment and modeling)
• Support more storage targets (where your data is has big implications for what you can do with that data)
Track events from anywhere
• Events• Entities
Make event data actionable in real-time
• Personalization• Marketing automation
• Content analytics
Today, Snowplow is an event data pipeline
What makes Snowplow special?• Data pipeline evolves with your
business
• Channel coverage
• Flexibility: where your data is delivered
• Flexibility: how your data is processed (enrichment and modeling)
• Data quality
• Speed
• Transparency
Used by 100s (1000s?) of companies…
…to answer their most important business questions
But there’s still much more to build!
• Improve automation around schema evolution
• Make modeling event data easier, more robust, more performant
• Support more storage targets
• Make it easier to act on event data
Data modeling in Spark
Druid, BigQuery, graph databases
Analytics SDKs, Sauna
Iglu: machine-readable schema registry
Questions?
• Can take questions now or after the other talks