Upload
abe-gong
View
691
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Slides from my Strata talk: http://strataconf.com/strata2014/public/schedule/speaker/163953 Abstract: Creating value from big, messy data sets can be a daunting task. The session introduces the Sidekick Pattern: using small, curated data to increase the value of Big Data. Drawing on lessons from data science for Jawbone’s UP fitness tracker, we will see how smart selection of data sidekicks can accelerate analysis, solve cold start problems, and simplify complicated data pipelines.
Citation preview
THE SIDEKICK PATTERN: USING SMALL DATA TO MULTIPLY
THE VALUE OF BIG DATA@AbeGong
Data Scientist, JawboneStrata - February 2014
Wednesday, February 12, 14
Wednesday, February 12, 14
Wednesday, February 12, 14
Wednesday, February 12, 14
Wednesday, February 12, 14
Wednesday, February 12, 14
Wednesday, February 12, 14
DATA SIDEKICKS
Wednesday, February 12, 14
EX: HIEROGLYPHTRANSLATION
Wednesday, February 12, 14
EX: HIEROGLYPHTRANSLATION
Wednesday, February 12, 14
EX: HIEROGLYPHTRANSLATION
Wednesday, February 12, 14
EX: CAMPAIGN TARGETING
Wednesday, February 12, 14
EX: CAMPAIGN TARGETING
Wednesday, February 12, 14
EX: CAMPAIGN TARGETING
Wednesday, February 12, 14
EX: SLEEP CONTEXT
Wednesday, February 12, 14
EX: SLEEP CONTEXT
Wednesday, February 12, 14
EX: SLEEP CONTEXT
Wednesday, February 12, 14
SUB-TITLE[DATA ART EXAMPLE]
Wednesday, February 12, 14
Wednesday, February 12, 14
EXAMPLES, PLEASE:WHICH DATA STREAMS GET
BIG?(...AND BESIDES SIZE, WHAT ELSE DO THEY HAVE IN COMMON?)
Wednesday, February 12, 14
BIG, RICH, MESSY
Wednesday, February 12, 14
CAREFULLY CURATEDBIG, RICH, MESSY
Wednesday, February 12, 14
TRANSMUTATION!
Wednesday, February 12, 14
EX: HUFFPO MODERATION
Wednesday, February 12, 14
Wednesday, February 12, 14
Wednesday, February 12, 14
EX: HUFFPO MODERATION
Wednesday, February 12, 14
EX: HUFFPO MODERATION
Wednesday, February 12, 14
WHEN SHOULD I USE THE SIDEKICK PATTERN?
Wednesday, February 12, 14
WHEN SHOULD I USE THE SIDEKICK PATTERN?
• To separate munging and cleaning from scaling.
Wednesday, February 12, 14
WHEN SHOULD I USE THE SIDEKICK PATTERN?
• To separate munging and cleaning from scaling.
• To bootstrap new data products.
Wednesday, February 12, 14
WHEN SHOULD I USE THE SIDEKICK PATTERN?
• To separate munging and cleaning from scaling.
• To bootstrap new data products.
• To leverage variety against volume.
Wednesday, February 12, 14
EX: SLEEP RECOVERY
Wednesday, February 12, 14
EX: SLEEP RECOVERY
Wednesday, February 12, 14
EX: SLEEP RECOVERY
Wednesday, February 12, 14
EX: SLEEP RECOVERY
Wednesday, February 12, 14
Wednesday, February 12, 14
Wednesday, February 12, 14
LEVELS OF ABSTRACTION
Wednesday, February 12, 14
LEVELS OF ABSTRACTION
Wednesday, February 12, 14
LEVELS OF ABSTRACTION
Wednesday, February 12, 14
QUESTIONS? COMMENTS?
@AbeGongData Scientist, JawboneStrata - February 2014
Wednesday, February 12, 14
Wednesday, February 12, 14
SmallFocusedCurated
AbstractBusiness logicInternal-facing
“Quantitative”Science-making
BigRich
Messy
SensoryUser experienceExternal-facing
“Qualitative”Story-making
Wednesday, February 12, 14
TRANSMUTATION EXAMPLESExample Property
Rosetta stone Synonyms/Comparability
Campaign targeting Demographic categories
Sleep context Context
Instrumental variables Causality
HuffPo moderation Credibility
Sleep recovery Clean examples
Economic mobility Continuity
Crowdflower gold Credibility
Example Property
Bridge cases in IRT scaling models Relative ranking
Sentiment analysis Categories
Pretty much all supervised learning Categories/Scales
...
Wednesday, February 12, 14
RECOMMENDED READING
• Pete Skomoroch: http://www.slideshare.net/pskomoroch/strata-endorsements-16939466
• Paco Nathan: http://www.slideshare.net/pacoid/using-cascalog-to-build-an-app-based-on-city-of-palo-alto-open-data
• Jay Kreps: http://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
• Joseph Turian: http://files.meetup.com/1542972/20120202-more-data-same-models-STUDY-SLIDES.pdf
• Me: http://blog.abegong.com/2014/02/wanted-good-examples-of-data-sidekicks.html
Wednesday, February 12, 14