46
THE SIDEKICK PATTERN: USING SMALL DATA TO MULTIPLY THE VALUE OF BIG DATA @AbeGong Data Scientist, Jawbone Strata - February 2014 Wednesday, February 12, 14

The Sidekick Pattern: Strata talk by Abe Gong

Embed Size (px)

DESCRIPTION

Slides from my Strata talk: http://strataconf.com/strata2014/public/schedule/speaker/163953 Abstract: Creating value from big, messy data sets can be a daunting task. The session introduces the Sidekick Pattern: using small, curated data to increase the value of Big Data. Drawing on lessons from data science for Jawbone’s UP fitness tracker, we will see how smart selection of data sidekicks can accelerate analysis, solve cold start problems, and simplify complicated data pipelines.

Citation preview

Page 1: The Sidekick Pattern: Strata talk by Abe Gong

THE SIDEKICK PATTERN: USING SMALL DATA TO MULTIPLY

THE VALUE OF BIG DATA@AbeGong

Data Scientist, JawboneStrata - February 2014

Wednesday, February 12, 14

Page 2: The Sidekick Pattern: Strata talk by Abe Gong

Wednesday, February 12, 14

Page 3: The Sidekick Pattern: Strata talk by Abe Gong

Wednesday, February 12, 14

Page 4: The Sidekick Pattern: Strata talk by Abe Gong

Wednesday, February 12, 14

Page 5: The Sidekick Pattern: Strata talk by Abe Gong

Wednesday, February 12, 14

Page 6: The Sidekick Pattern: Strata talk by Abe Gong

Wednesday, February 12, 14

Page 7: The Sidekick Pattern: Strata talk by Abe Gong

Wednesday, February 12, 14

Page 8: The Sidekick Pattern: Strata talk by Abe Gong

DATA SIDEKICKS

Wednesday, February 12, 14

Page 9: The Sidekick Pattern: Strata talk by Abe Gong

EX: HIEROGLYPHTRANSLATION

Wednesday, February 12, 14

Page 10: The Sidekick Pattern: Strata talk by Abe Gong

EX: HIEROGLYPHTRANSLATION

Wednesday, February 12, 14

Page 11: The Sidekick Pattern: Strata talk by Abe Gong

EX: HIEROGLYPHTRANSLATION

Wednesday, February 12, 14

Page 12: The Sidekick Pattern: Strata talk by Abe Gong

EX: CAMPAIGN TARGETING

Wednesday, February 12, 14

Page 13: The Sidekick Pattern: Strata talk by Abe Gong

EX: CAMPAIGN TARGETING

Wednesday, February 12, 14

Page 14: The Sidekick Pattern: Strata talk by Abe Gong

EX: CAMPAIGN TARGETING

Wednesday, February 12, 14

Page 15: The Sidekick Pattern: Strata talk by Abe Gong

EX: SLEEP CONTEXT

Wednesday, February 12, 14

Page 16: The Sidekick Pattern: Strata talk by Abe Gong

EX: SLEEP CONTEXT

Wednesday, February 12, 14

Page 17: The Sidekick Pattern: Strata talk by Abe Gong

EX: SLEEP CONTEXT

Wednesday, February 12, 14

Page 18: The Sidekick Pattern: Strata talk by Abe Gong

SUB-TITLE[DATA ART EXAMPLE]

Wednesday, February 12, 14

Page 19: The Sidekick Pattern: Strata talk by Abe Gong

Wednesday, February 12, 14

Page 20: The Sidekick Pattern: Strata talk by Abe Gong

EXAMPLES, PLEASE:WHICH DATA STREAMS GET

BIG?(...AND BESIDES SIZE, WHAT ELSE DO THEY HAVE IN COMMON?)

Wednesday, February 12, 14

Page 21: The Sidekick Pattern: Strata talk by Abe Gong

BIG, RICH, MESSY

Wednesday, February 12, 14

Page 22: The Sidekick Pattern: Strata talk by Abe Gong

CAREFULLY CURATEDBIG, RICH, MESSY

Wednesday, February 12, 14

Page 23: The Sidekick Pattern: Strata talk by Abe Gong

TRANSMUTATION!

Wednesday, February 12, 14

Page 24: The Sidekick Pattern: Strata talk by Abe Gong

EX: HUFFPO MODERATION

Wednesday, February 12, 14

Page 25: The Sidekick Pattern: Strata talk by Abe Gong

Wednesday, February 12, 14

Page 26: The Sidekick Pattern: Strata talk by Abe Gong

Wednesday, February 12, 14

Page 27: The Sidekick Pattern: Strata talk by Abe Gong

EX: HUFFPO MODERATION

Wednesday, February 12, 14

Page 28: The Sidekick Pattern: Strata talk by Abe Gong

EX: HUFFPO MODERATION

Wednesday, February 12, 14

Page 29: The Sidekick Pattern: Strata talk by Abe Gong

WHEN SHOULD I USE THE SIDEKICK PATTERN?

Wednesday, February 12, 14

Page 30: The Sidekick Pattern: Strata talk by Abe Gong

WHEN SHOULD I USE THE SIDEKICK PATTERN?

• To separate munging and cleaning from scaling.

Wednesday, February 12, 14

Page 31: The Sidekick Pattern: Strata talk by Abe Gong

WHEN SHOULD I USE THE SIDEKICK PATTERN?

• To separate munging and cleaning from scaling.

• To bootstrap new data products.

Wednesday, February 12, 14

Page 32: The Sidekick Pattern: Strata talk by Abe Gong

WHEN SHOULD I USE THE SIDEKICK PATTERN?

• To separate munging and cleaning from scaling.

• To bootstrap new data products.

• To leverage variety against volume.

Wednesday, February 12, 14

Page 33: The Sidekick Pattern: Strata talk by Abe Gong

EX: SLEEP RECOVERY

Wednesday, February 12, 14

Page 34: The Sidekick Pattern: Strata talk by Abe Gong

EX: SLEEP RECOVERY

Wednesday, February 12, 14

Page 35: The Sidekick Pattern: Strata talk by Abe Gong

EX: SLEEP RECOVERY

Wednesday, February 12, 14

Page 36: The Sidekick Pattern: Strata talk by Abe Gong

EX: SLEEP RECOVERY

Wednesday, February 12, 14

Page 37: The Sidekick Pattern: Strata talk by Abe Gong

Wednesday, February 12, 14

Page 38: The Sidekick Pattern: Strata talk by Abe Gong

Wednesday, February 12, 14

Page 39: The Sidekick Pattern: Strata talk by Abe Gong

LEVELS OF ABSTRACTION

Wednesday, February 12, 14

Page 40: The Sidekick Pattern: Strata talk by Abe Gong

LEVELS OF ABSTRACTION

Wednesday, February 12, 14

Page 41: The Sidekick Pattern: Strata talk by Abe Gong

LEVELS OF ABSTRACTION

Wednesday, February 12, 14

Page 42: The Sidekick Pattern: Strata talk by Abe Gong

QUESTIONS? COMMENTS?

@AbeGongData Scientist, JawboneStrata - February 2014

Wednesday, February 12, 14

Page 43: The Sidekick Pattern: Strata talk by Abe Gong

Wednesday, February 12, 14

Page 44: The Sidekick Pattern: Strata talk by Abe Gong

SmallFocusedCurated

AbstractBusiness logicInternal-facing

“Quantitative”Science-making

BigRich

Messy

SensoryUser experienceExternal-facing

“Qualitative”Story-making

Wednesday, February 12, 14

Page 45: The Sidekick Pattern: Strata talk by Abe Gong

TRANSMUTATION EXAMPLESExample Property

Rosetta stone Synonyms/Comparability

Campaign targeting Demographic categories

Sleep context Context

Instrumental variables Causality

HuffPo moderation Credibility

Sleep recovery Clean examples

Economic mobility Continuity

Crowdflower gold Credibility

Example Property

Bridge cases in IRT scaling models Relative ranking

Sentiment analysis Categories

Pretty much all supervised learning Categories/Scales

...

Wednesday, February 12, 14

Page 46: The Sidekick Pattern: Strata talk by Abe Gong

RECOMMENDED READING

• Pete Skomoroch: http://www.slideshare.net/pskomoroch/strata-endorsements-16939466

• Paco Nathan: http://www.slideshare.net/pacoid/using-cascalog-to-build-an-app-based-on-city-of-palo-alto-open-data

• Jay Kreps: http://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying

• Joseph Turian: http://files.meetup.com/1542972/20120202-more-data-same-models-STUDY-SLIDES.pdf

• Me: http://blog.abegong.com/2014/02/wanted-good-examples-of-data-sidekicks.html

Wednesday, February 12, 14