32

Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017

Embed Size (px)

Citation preview

Irene Gonzálvez, Product Manager at Spotify

Big Data, Big Quality?

Irene GonzálvezProduct ManagerData Infrastructure

Music Streaming ServiceLaunched in 2008

Premium and Free TiersAvailable in 61 Countries

Over 140M Monthly Active Users

More than 30M Songs

Over 1 billion plays per day

Data enables recommendations, advertising, label and artist payments and more

$ $ $$ $ $

Data First

Data of Good Quality First

Data quality problems cost US business $600B a year!

Data Warehouse Institute

Data Quality Dimensions

Timely Correctness

Completeness Consistency

DataMon

Data Counters

MetriLab

MetriLab

MetriLab

Data Quality Dimensions

Timely Correctness

Completeness Consistency

Datamon Data CountersMetriLab

TC4D: Test Certified for DataLevel 1: Set-up, monitoring, alerting and documentation

Level 2: Data management and Unit tests

Level 3: Build your defenses

What’s next?Build an algorithm library for anomaly detection (ML4ALL)

Provide the infrastructure to ‘plug&play’ more algorithms

Provide parameter recommendations to tweak the algorithms

What’s next?Spotify-wide strategy

● Have metrics to understand when a dataset qualifies as ‘good’ quality.

● Identify which datasets are critical/ central to Spotify and make them of ‘good’ quality

Key Takeaways

Lesson #1: Think Big Understand your org’s pain points

Lesson #2: Start smallAnd start NOW!

Lesson #3: Data Quality is not an add-on

Insights can ONLY be as good as the data

Data will increase 10x by 2025International Data Corp

1 ZB = 1 trillion GB

20% 10%Critical Data Hypercritical Data

Q&AIrene Gonzálvez Product Manager,Spotify

[email protected]