38
calculation | consulting data science leadership (TM) c|c (TM) charles@calculationconsulting.com

Cc hass b school talk 2105

Embed Size (px)

Citation preview

Page 1: Cc hass b school talk  2105

calculation | consulting data science leadership

(TM)

c|c (TM)

[email protected]

Page 2: Cc hass b school talk  2105

calculation|consultingData Science Leadership

(TM)

[email protected]

Page 3: Cc hass b school talk  2105

calculation | consulting data science leadership

Who Are We?

c|c (TM)

Dr. Charles H. Martin, PhD University of Chicago, Chemical PhysicsNSF Fellow in Theoretical Chemistry

Over 10 years experience in applied Machine LearningDeveloped ML algos for Demand Media; the first $1B IPO since Google

Lean Start Ups: Aardvark (acquired by Google), eHow, ModeWall Street: BlackRock, GLGFortune 500: Big Pharma, Telecom, eBay

[email protected]

(TM)

3

Page 4: Cc hass b school talk  2105

BackStory: in 2011, Search Changed. Forever.

• first $1B IPO since Google

• Machine Learning based SEO algorithms

• Measure the demand for search, and fulfill it

data science algorithms created a billion $ company

c|c (TM)

(TM)

Demand Media

calculation | consulting data science leadership(TM)

4

eHow.com

Page 5: Cc hass b school talk  2105

BackStory: in 2011, Search Changed. Forever.

• Google adapted (Panda)

• Lack of diversification

• Lack of adaptation

• Stock price never recovered

algorithmic accountability: DMD or Google?

c|c (TM)

IPO

Panda

stock price 2011-2012

(TM)

calculation | consulting data science leadership

DMD

(TM)

5

Page 6: Cc hass b school talk  2105

• first $1B collapse due to Panda ?

• CPC revenues down

• premium online publishers diedcollapse

?stock price 2011-2012

c|c (TM)

$1B in ad revenue was repriced and reallocated

Problem: Cornering the market on search induced a market crash

calculation | consulting data science leadership(TM)

6

Page 7: Cc hass b school talk  2105

c|c (TM)

Panda-Induced ‘Market Crash’Google CPC dropped just after Panda

calculation | consulting data science leadership(TM)

7

Page 8: Cc hass b school talk  2105

Data Science is Different

c|c (TM)

Thomas H. Davenport

calculation | consulting data science leadership

Generating sustainable revenue requires Data Science Leadership and Execution

(TM)

8

“Companies need a Spock in the boardroom”

Page 9: Cc hass b school talk  2105

Data Science is Different

c|c (TM)

Thomas H. Davenport

calculation | consulting data science leadership

Generating sustainable revenue requires Data Science Leadership and Execution

(TM)

9

http://www.theonion.com/articles/national-science-foundation-science-hard,1405/

Page 10: Cc hass b school talk  2105

Problem: Data Scientists are Different

c|c (TM)

Thomas H. Davenport

calculation | consulting data science leadership(TM)

10

not all techies are the same

Page 11: Cc hass b school talk  2105

Problem: Data Scientists are Different

c|c (TM)

Thomas H. Davenport

calculation | consulting data science leadership

theoretical physics machine learning specialist

(TM)

11

experimental physics data scientist

engineer software, browser tech, dev ops, …

not all techies are the same

Page 12: Cc hass b school talk  2105

Problem: Data Scientists are Different

c|c (TM)

Thomas H. Davenport

calculation | consulting data science leadership(TM)

12

not all techies are the same

Page 13: Cc hass b school talk  2105

Managing: Data Science Process

• Acquire Domain Knowledge

• Formulate Hypothesis

• Generate Model(s) from the Data

• Predict Revenue Gains

• Backtest Predictions on your Data

• A/B Test in Production

• Attribute Gains to Model(s)

c|c (TM)

(TM)

acting

solving

framing

calculation | consulting data science leadership13

Page 14: Cc hass b school talk  2105

Managing: Data Science Process

c|c (TM)

(TM)calculation | consulting data science leadership14

Page 15: Cc hass b school talk  2105

c|c (TM)

• Systems Thinking: leveraging the inter-relationships between data, marketing, and the customer

• Knowledge Transfer: mentoring — not training — to develop both personal mastery and team learning

• Mental Models: create a base of small-scale models for thinking about how to use your data

• Knowledge Sharing: foster collaboration between research, engineering, and product to drive revenue

Managing: Learning from Data

calculation | consulting data science leadership(TM)

15

Page 16: Cc hass b school talk  2105

c|c (TM)

• Cross-functional engineering, product, marketing, finance

• Autonomous: separate from the traditional engineering product lifecycle. self-organizing and self-managing

• Experimental: form hypothesis, analyze data, make predictions, run backtests, A/B testing

• Self-sustaining: not a cost center; generates revenue

(TM)

Data Science is Different

calculation | consulting data science leadership16

Page 17: Cc hass b school talk  2105

Solution: Collecting and Organizing Data

(TM)

c|c (TM)

• Most companies are struggling organizing their data

• Data needs to be examined

• Don’t assume data is correct or useful

• More is More: simple algos work

• More is Less: noise is noise

Data not examined is not collected

calculation | consulting data science leadership17

Page 18: Cc hass b school talk  2105

Solutions: Hadoop and Big Data

(TM)

c|c (TM)

• Hadoop is an internal data ecosystem

• Hadoop appears to have won the adoption wars ?

• Hadoop : 90% deployments internal

• Hadoop is a cost center

• ROI needs cut across business divisions

Algorithms, not data, generate revenue

calculation | consulting data science leadership18

Page 19: Cc hass b school talk  2105

Solutions: Cloud

(TM)

c|c (TM)

• Startups don’t need infrastructure

• long term Data Storage is virtually free

• Amazon Redshift

• Google Big Query

• Cloud is the future ?

Algorithms, not data, generate revenue

calculation | consulting data science leadership19

Page 20: Cc hass b school talk  2105

Solutions: Spark

(TM)

c|c (TM)

• Next Gen Platform for Machine Learning

• Sits on Hadoop or the Cloud

• Still very high touch

• Limited algos

Algorithms, not data, generate revenue

calculation | consulting data science leadership20

Page 21: Cc hass b school talk  2105

Problem: Measurements

(TM)

c|c (TM)

good experiments are amazing

calculation | consulting data science leadership21

“If you can’t measure it, you can’t fix it.”DJ Patil, White House Chief Data Scientist

Page 22: Cc hass b school talk  2105

Data Science’s Measurement Problem

(TM)

c|c (TM)

good experiments are hard to design

calculation | consulting data science leadership22

http://www.forbes.com/sites/lizryan/2014/02/10/if-you-cant-measure-it-you-cant-manage-it-is-bs/

Page 23: Cc hass b school talk  2105

Data Science’s Measurement Problem

(TM)

c|c (TM)

good experiments are hard to design

calculation | consulting data science leadership23

“Data science has a measurement problem. Simple metrics may not address complex situations.

But complex metrics present myriad problems.”

“As we strive for better algorithms, we often fail to think critically about what it means

for predictions to be ‘good’”

http://www.kdnuggets.com/2015/03/data-science-measurement-problem-accuracy-auroc-f1.html

Page 24: Cc hass b school talk  2105

Data Science’s Measurement Problem

(TM)

c|c (TM)

good experiments are hard to design

calculation | consulting data science leadership24

“Buffett found it 'extraordinary' that academics studied such things. They studied what was measurable, rather than what was meaningful. ‘

… to a man with a hammer, everything looks like a nail.”

― Roger Lowenstein, Buffett: The Making of an American Capitalist

Page 25: Cc hass b school talk  2105

c|c (TM)

(TM)

Problem: The Cult of the Algorithm

calculation | consulting data science leadership25

what can algos actually do ?

“We have a new machine learning algo that anticipate your needs over time and behave accordingly”

Page 26: Cc hass b school talk  2105

c|c (TM)

(TM)

Problem: What can Machine Learning Do?

calculation | consulting data science leadership26

what can algos actually do ?

Page 27: Cc hass b school talk  2105

Demand Algos: Gas Station AnalogyProblem: where to open a gas station ?Need: good traffic, weak competition

c|c (TM)

less competitorsno trafficsweet spotgreat traffic

too many competitors

calculation | consulting data science leadership

all businesses balance supply and demand

(TM)

27

Page 28: Cc hass b school talk  2105

SAAS Machine Learning Algos

c|c (TM)

calculation | consulting data science leadership(TM)

28

$100,000 • 167 teamsDiabetic Retinopathy Detection

$15,000 • 341 teamsMarch Machine Learning Mania 2015

machine learning contests

Page 29: Cc hass b school talk  2105

SAAS Machine Learning Algos

c|c (TM)

calculation | consulting data science leadership(TM)

29

machine learning apis

Page 30: Cc hass b school talk  2105

c|c (TM)

(TM)

Problem: What can Deep Learning Do?

calculation | consulting data science leadership30

what can algos actually do ?

Page 31: Cc hass b school talk  2105

c|c (TM)

(TM)

Problem: Externalities

calculation | consulting data science leadership31

external factors can change

Page 32: Cc hass b school talk  2105

c|c (TM)

(TM)

Problem: Externalities

calculation | consulting data science leadership32

“Zynga is our best company ever!” (2010)John Doerr, Google Investor, Legendary VC

http://venturebeat.com/2010/11/16/google-investor-john-doerr-zynga-is-our-best-company-ever/

one marketplace | big risks

Page 33: Cc hass b school talk  2105

c|c (TM)

(TM)

Solution: Algorithmic Accountability

calculation | consulting data science leadership

An asset is an economic resource.

Anything tangible or intangible that is capable of being owned or controlled to produce value and that is held to have positive economic value is considered an asset.

algorithms can be valuable assets

33

Page 34: Cc hass b school talk  2105

c|c (TM)

(TM)

Algorithmic Accountability

calculation | consulting data science leadership34

does revenue depends on hidden algos ?

• WebMD Google SEO

• Amazon Product Listing Algo

• Pinterest Relevance Algo

• Twitter Spam filter

• Apple App Store Rankings

Page 35: Cc hass b school talk  2105

c|c (TM)

(TM)

Algorithmic Accountability

calculation | consulting data science leadership35

do decisions depend on hidden factors ?

A 'Crisis' in Online Ads: One-Third of Traffic Is Bogushttp://www.wsj.com/articles/SB10001424052702304026304579453253860786362

Now Algorithms Are Deciding Whom To Hire…http://www.npr.org/blogs/alltechconsidered/2015/03/23/394827451/now-algorithms-are-deciding-whom-to-hire-based-on-voice

What you don’t know about Internet algorithms is hurting you…http://www.washingtonpost.com/news/the-intersect/wp/2015/03/23/what-you-dont-know-about-internet-algorithms-is-hurting-you-and-you-probably-dont-know-very-much/

Page 36: Cc hass b school talk  2105

c|c (TM)

(TM)

Solution: Algorithmic Transparency

calculation | consulting data science leadership36

can you be transparent and not be gamed ?

http://fortune.com/2015/03/18/how-do-you-govern-a-hidden-fluid-and-amoral-algorithm/

83% of the participants in the study changed their behavior once they knew about the algorithm

How do you govern a (hidden, fluid and amoral) algorithm?

participants mistakenly believed that their friends intentionally chose not to show them stories

Page 37: Cc hass b school talk  2105

c|c (TM)

(TM)

Algorithmic Accountability

calculation | consulting data science leadership

Do you depend on some else’s marketplace?

How does your revenue depend on algos?

Do you need an internal algo ?

Who will manage it? build it? maintain it?

algorithms have unforeseen liabilities

37

Page 38: Cc hass b school talk  2105

(TM)

c|c (TM)

c | c

[email protected]