38
1 Data science - a commercial perspective Gordon Blunt Gordon Blunt Analytics Ltd Royal Statistical Society annual conference 9th September 2015

Data science - a commercial perspective › RSS_Data_science_a... · 2 Data science 3 Skills needed ‘Softer’ skills Statistical skills 4 Concluding thoughts 5 References. 3 Outline

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Data science - a commercial perspective › RSS_Data_science_a... · 2 Data science 3 Skills needed ‘Softer’ skills Statistical skills 4 Concluding thoughts 5 References. 3 Outline

1

Data science - a commercial perspective

Gordon Blunt

Gordon Blunt Analytics Ltd

Royal Statistical Society annual conference9th September 2015

Page 2: Data science - a commercial perspective › RSS_Data_science_a... · 2 Data science 3 Skills needed ‘Softer’ skills Statistical skills 4 Concluding thoughts 5 References. 3 Outline

2

Outline

1 Background

2 Data science

3 Skills needed‘Softer’ skillsStatistical skills

4 Concluding thoughts

5 References

Page 3: Data science - a commercial perspective › RSS_Data_science_a... · 2 Data science 3 Skills needed ‘Softer’ skills Statistical skills 4 Concluding thoughts 5 References. 3 Outline

3

Outline

1 Background

2 Data science

3 Skills needed‘Softer’ skillsStatistical skills

4 Concluding thoughts

5 References

Page 4: Data science - a commercial perspective › RSS_Data_science_a... · 2 Data science 3 Skills needed ‘Softer’ skills Statistical skills 4 Concluding thoughts 5 References. 3 Outline

4

My background

Work - ‘client side’Fast moving consumer goods (FMCG)

Royal Mail

Barclaycard

Work - consultancyCACI Ltd

GfK NOP LtdGordon Blunt Analytics Ltd (2008→)

FMCGFinancial servicesData consultancyMarket research

Page 5: Data science - a commercial perspective › RSS_Data_science_a... · 2 Data science 3 Skills needed ‘Softer’ skills Statistical skills 4 Concluding thoughts 5 References. 3 Outline

5

Outline

1 Background

2 Data science

3 Skills needed‘Softer’ skillsStatistical skills

4 Concluding thoughts

5 References

Page 6: Data science - a commercial perspective › RSS_Data_science_a... · 2 Data science 3 Skills needed ‘Softer’ skills Statistical skills 4 Concluding thoughts 5 References. 3 Outline

6

Nature of data science

My starting pointData science is statistics

orStatistics is, and always has been, data science

Data are the most important part of statisticsI’m not alone in this view . . .

‘Statistics starts with data’ [Breiman 2001]

Bill Cleveland and John Tukey voiced similar thoughts[Cleveland 2001], [Tukey 1962]

But. . .

Page 7: Data science - a commercial perspective › RSS_Data_science_a... · 2 Data science 3 Skills needed ‘Softer’ skills Statistical skills 4 Concluding thoughts 5 References. 3 Outline

6

Nature of data science

My starting pointData science is statistics

orStatistics is, and always has been, data science

Data are the most important part of statisticsI’m not alone in this view . . .

‘Statistics starts with data’ [Breiman 2001]

Bill Cleveland and John Tukey voiced similar thoughts[Cleveland 2001], [Tukey 1962]

But. . .

Page 8: Data science - a commercial perspective › RSS_Data_science_a... · 2 Data science 3 Skills needed ‘Softer’ skills Statistical skills 4 Concluding thoughts 5 References. 3 Outline

6

Nature of data science

My starting pointData science is statistics

orStatistics is, and always has been, data science

Data are the most important part of statisticsI’m not alone in this view . . .

‘Statistics starts with data’ [Breiman 2001]

Bill Cleveland and John Tukey voiced similar thoughts[Cleveland 2001], [Tukey 1962]

But. . .

Page 9: Data science - a commercial perspective › RSS_Data_science_a... · 2 Data science 3 Skills needed ‘Softer’ skills Statistical skills 4 Concluding thoughts 5 References. 3 Outline

6

Nature of data science

My starting pointData science is statistics

orStatistics is, and always has been, data science

Data are the most important part of statisticsI’m not alone in this view . . .

‘Statistics starts with data’ [Breiman 2001]

Bill Cleveland and John Tukey voiced similar thoughts[Cleveland 2001], [Tukey 1962]

But. . .

Page 10: Data science - a commercial perspective › RSS_Data_science_a... · 2 Data science 3 Skills needed ‘Softer’ skills Statistical skills 4 Concluding thoughts 5 References. 3 Outline

7

Some characteristics of data science

Massive data sets10n observations where n > (or possibly≫) 7

10m variables where m > (or possibly≫) 3

Modern computing powerComputers are very cheap todayCost per 1MB memory . . .

≈ 3 × 10−10 of cost in 19651

Other disciplines are now analysing data too, for example . . .Machine learning

Database management

Knowledge discovery in databases

1http://jcmit.com/memoryprice.htm

Page 11: Data science - a commercial perspective › RSS_Data_science_a... · 2 Data science 3 Skills needed ‘Softer’ skills Statistical skills 4 Concluding thoughts 5 References. 3 Outline

7

Some characteristics of data science

Massive data sets10n observations where n > (or possibly≫) 7

10m variables where m > (or possibly≫) 3

Modern computing powerComputers are very cheap todayCost per 1MB memory . . .

≈ 3 × 10−10 of cost in 19651

Other disciplines are now analysing data too, for example . . .Machine learning

Database management

Knowledge discovery in databases

1http://jcmit.com/memoryprice.htm

Page 12: Data science - a commercial perspective › RSS_Data_science_a... · 2 Data science 3 Skills needed ‘Softer’ skills Statistical skills 4 Concluding thoughts 5 References. 3 Outline

8

‘Components of a successful data science team’2

Skilled professionals needed1 Data Engineer

‘does not need to be very academic [. . . ] technicalcompetency on the back-end frameworks and tools used forcapturing the data points’

2 Machine Learning Expert‘statistical background, having a deep interest in quantitativetopics [. . . ] solid understanding of data algorithms and datastructures in specific, and software engineering concepts’

3 Business Analyst‘an eye for details and [. . . ] exceptional analytical skills [. . . ]solid understanding of the organization’s business model’

The emphases are mine, by the way

2http://www.kdnuggets.com/2015/08/3-components-successful-data-science-team.html August 12 2015

Page 13: Data science - a commercial perspective › RSS_Data_science_a... · 2 Data science 3 Skills needed ‘Softer’ skills Statistical skills 4 Concluding thoughts 5 References. 3 Outline

8

‘Components of a successful data science team’2

Skilled professionals needed1 Data Engineer

‘does not need to be very academic [. . . ] technicalcompetency on the back-end frameworks and tools used forcapturing the data points’

2 Machine Learning Expert‘statistical background, having a deep interest in quantitativetopics [. . . ] solid understanding of data algorithms and datastructures in specific, and software engineering concepts’

3 Business Analyst‘an eye for details and [. . . ] exceptional analytical skills [. . . ]solid understanding of the organization’s business model’

The emphases are mine, by the way

2http://www.kdnuggets.com/2015/08/3-components-successful-data-science-team.html August 12 2015

Page 14: Data science - a commercial perspective › RSS_Data_science_a... · 2 Data science 3 Skills needed ‘Softer’ skills Statistical skills 4 Concluding thoughts 5 References. 3 Outline

9

Outline

1 Background

2 Data science

3 Skills needed‘Softer’ skillsStatistical skills

4 Concluding thoughts

5 References

Page 15: Data science - a commercial perspective › RSS_Data_science_a... · 2 Data science 3 Skills needed ‘Softer’ skills Statistical skills 4 Concluding thoughts 5 References. 3 Outline

10

The commercial imperative

Companies want answers that are . . .Timely (often have short deadlines)

Practical (can be used in the business)

Useful (generates enough revenue)

Companies have . . .Mountains of data

Little time

Relatively few skilled analysts

Statistics must be taught as a practical subject, or it will beovertaken by other disciplines

Page 16: Data science - a commercial perspective › RSS_Data_science_a... · 2 Data science 3 Skills needed ‘Softer’ skills Statistical skills 4 Concluding thoughts 5 References. 3 Outline

10

The commercial imperative

Companies want answers that are . . .Timely (often have short deadlines)

Practical (can be used in the business)

Useful (generates enough revenue)

Companies have . . .Mountains of data

Little time

Relatively few skilled analysts

Statistics must be taught as a practical subject, or it will beovertaken by other disciplines

Page 17: Data science - a commercial perspective › RSS_Data_science_a... · 2 Data science 3 Skills needed ‘Softer’ skills Statistical skills 4 Concluding thoughts 5 References. 3 Outline

10

The commercial imperative

Companies want answers that are . . .Timely (often have short deadlines)

Practical (can be used in the business)

Useful (generates enough revenue)

Companies have . . .Mountains of data

Little time

Relatively few skilled analysts

Statistics must be taught as a practical subject, or it will beovertaken by other disciplines

Page 18: Data science - a commercial perspective › RSS_Data_science_a... · 2 Data science 3 Skills needed ‘Softer’ skills Statistical skills 4 Concluding thoughts 5 References. 3 Outline

10

The commercial imperative

Companies want answers that are . . .Timely (often have short deadlines)

Practical (can be used in the business)

Useful (generates enough revenue)

Companies have . . .Mountains of data

Little time

Relatively few skilled analysts

Statistics must be taught as a practical subject, or it will beovertaken by other disciplines

Page 19: Data science - a commercial perspective › RSS_Data_science_a... · 2 Data science 3 Skills needed ‘Softer’ skills Statistical skills 4 Concluding thoughts 5 References. 3 Outline

10

The commercial imperative

Companies want answers that are . . .Timely (often have short deadlines)

Practical (can be used in the business)

Useful (generates enough revenue)

Companies have . . .Mountains of data

Little time

Relatively few skilled analysts

Statistics must be taught as a practical subject, or it will beovertaken by other disciplines

Page 20: Data science - a commercial perspective › RSS_Data_science_a... · 2 Data science 3 Skills needed ‘Softer’ skills Statistical skills 4 Concluding thoughts 5 References. 3 Outline

10

The commercial imperative

Companies want answers that are . . .Timely (often have short deadlines)

Practical (can be used in the business)

Useful (generates enough revenue)

Companies have . . .Mountains of data

Little time

Relatively few skilled analysts

Statistics must be taught as a practical subject, or it will beovertaken by other disciplines

Page 21: Data science - a commercial perspective › RSS_Data_science_a... · 2 Data science 3 Skills needed ‘Softer’ skills Statistical skills 4 Concluding thoughts 5 References. 3 Outline

11

Core skills - ‘softer’

CommunicationInfluencing

Appropriate language (often non-statistical!)

Brevity

Commercial awarenessTime managementAbility to work . . .

- independently- and / or as part of a ‘non-technical’ team

Problem solving

Creative thinking

And, please, common sense (e.g. the ‘sniff test’)!

Page 22: Data science - a commercial perspective › RSS_Data_science_a... · 2 Data science 3 Skills needed ‘Softer’ skills Statistical skills 4 Concluding thoughts 5 References. 3 Outline

11

Core skills - ‘softer’

CommunicationInfluencing

Appropriate language (often non-statistical!)

Brevity

Commercial awarenessTime managementAbility to work . . .

- independently- and / or as part of a ‘non-technical’ team

Problem solving

Creative thinking

And, please, common sense (e.g. the ‘sniff test’)!

Page 23: Data science - a commercial perspective › RSS_Data_science_a... · 2 Data science 3 Skills needed ‘Softer’ skills Statistical skills 4 Concluding thoughts 5 References. 3 Outline

12

Communication

InfluencingWe (probably) need to sell our analysisUnderstand the client’s motivations

- what does the client want?- what does the client need to be told?

Engage in debate at senior levels - can be challenging- might not have much time - be brief

Always have something positive to say

Appropriate languageExplain in ways the client can understandBe careful about statistical jargon, for example . . .

- ‘error’ likely to be interpreted as ‘mistake’- ‘normal’ likely to be interpreted as ‘commonplace’- ‘significance’ - statistical or useful?

Page 24: Data science - a commercial perspective › RSS_Data_science_a... · 2 Data science 3 Skills needed ‘Softer’ skills Statistical skills 4 Concluding thoughts 5 References. 3 Outline

12

Communication

InfluencingWe (probably) need to sell our analysisUnderstand the client’s motivations

- what does the client want?- what does the client need to be told?

Engage in debate at senior levels - can be challenging- might not have much time - be brief

Always have something positive to say

Appropriate languageExplain in ways the client can understandBe careful about statistical jargon, for example . . .

- ‘error’ likely to be interpreted as ‘mistake’- ‘normal’ likely to be interpreted as ‘commonplace’- ‘significance’ - statistical or useful?

Page 25: Data science - a commercial perspective › RSS_Data_science_a... · 2 Data science 3 Skills needed ‘Softer’ skills Statistical skills 4 Concluding thoughts 5 References. 3 Outline

12

Communication

InfluencingWe (probably) need to sell our analysisUnderstand the client’s motivations

- what does the client want?- what does the client need to be told?

Engage in debate at senior levels - can be challenging- might not have much time - be brief

Always have something positive to say

Appropriate languageExplain in ways the client can understandBe careful about statistical jargon, for example . . .

- ‘error’ likely to be interpreted as ‘mistake’- ‘normal’ likely to be interpreted as ‘commonplace’- ‘significance’ - statistical or useful?

Page 26: Data science - a commercial perspective › RSS_Data_science_a... · 2 Data science 3 Skills needed ‘Softer’ skills Statistical skills 4 Concluding thoughts 5 References. 3 Outline

13

Core skills - technical

Statistics - knowledge assumed . . .‘Core’ statistics

- subjects found in undergraduate / masters courses

Experience of (messy) commercial data- these are the reason we need strong EDA skills

Limitations of traditional tests with large data sets

Advanced mathematical and computational methods

Coding and / or programming

Python

Hadoop

Weka

. . . and / or many others . . .

(of course!)

Page 27: Data science - a commercial perspective › RSS_Data_science_a... · 2 Data science 3 Skills needed ‘Softer’ skills Statistical skills 4 Concluding thoughts 5 References. 3 Outline

13

Core skills - technical

Statistics - knowledge assumed . . .‘Core’ statistics

- subjects found in undergraduate / masters courses

Experience of (messy) commercial data- these are the reason we need strong EDA skills

Limitations of traditional tests with large data sets

Advanced mathematical and computational methods

Coding and / or programming

Python

Hadoop

Weka

. . . and / or many others . . .

(of course!)

Page 28: Data science - a commercial perspective › RSS_Data_science_a... · 2 Data science 3 Skills needed ‘Softer’ skills Statistical skills 4 Concluding thoughts 5 References. 3 Outline

13

Core skills - technical

Statistics - knowledge assumed . . .‘Core’ statistics

- subjects found in undergraduate / masters courses

Experience of (messy) commercial data- these are the reason we need strong EDA skills

Limitations of traditional tests with large data sets

Advanced mathematical and computational methods

Coding and / or programming

Python

Hadoop

Weka

. . . and / or many others . . .

(of course!)

Page 29: Data science - a commercial perspective › RSS_Data_science_a... · 2 Data science 3 Skills needed ‘Softer’ skills Statistical skills 4 Concluding thoughts 5 References. 3 Outline

14

Statistics, big data and the commercial sector

A good starting point‘All models are wrong, but some are useful’ [Box 1979]

Exploratory / graphical data analysis are crucial[Tukey 1977, Unwin 2015]

We need to teach . . .Simple is - often - better than ‘best’

- by the time we’ve built the ‘best’ model, it’s usually out of date

The basics are crucial- EDA- data quality / cleaning- visualisation- graphical presentation

Page 30: Data science - a commercial perspective › RSS_Data_science_a... · 2 Data science 3 Skills needed ‘Softer’ skills Statistical skills 4 Concluding thoughts 5 References. 3 Outline

14

Statistics, big data and the commercial sector

A good starting point‘All models are wrong, but some are useful’ [Box 1979]

Exploratory / graphical data analysis are crucial[Tukey 1977, Unwin 2015]

We need to teach . . .Simple is - often - better than ‘best’

- by the time we’ve built the ‘best’ model, it’s usually out of date

The basics are crucial- EDA- data quality / cleaning- visualisation- graphical presentation

Page 31: Data science - a commercial perspective › RSS_Data_science_a... · 2 Data science 3 Skills needed ‘Softer’ skills Statistical skills 4 Concluding thoughts 5 References. 3 Outline

14

Statistics, big data and the commercial sector

A good starting point‘All models are wrong, but some are useful’ [Box 1979]

Exploratory / graphical data analysis are crucial[Tukey 1977, Unwin 2015]

We need to teach . . .Simple is - often - better than ‘best’

- by the time we’ve built the ‘best’ model, it’s usually out of date

The basics are crucial- EDA- data quality / cleaning- visualisation- graphical presentation

Page 32: Data science - a commercial perspective › RSS_Data_science_a... · 2 Data science 3 Skills needed ‘Softer’ skills Statistical skills 4 Concluding thoughts 5 References. 3 Outline

15

Outline

1 Background

2 Data science

3 Skills needed‘Softer’ skillsStatistical skills

4 Concluding thoughts

5 References

Page 33: Data science - a commercial perspective › RSS_Data_science_a... · 2 Data science 3 Skills needed ‘Softer’ skills Statistical skills 4 Concluding thoughts 5 References. 3 Outline

16

The skills needed are . . .

1 Communication- Influencing- Language- Brevity

2 Common sense3 Time management4 Ability to work with non-technical colleagues5 Statistics6 Modelling

- Exploratory / graphical data analysis / presentation- Critical assessment of models / methods- Not just statistical assessment- ‘Commercial utility’ - simpler is often better

7 Coding

http://www.gordonblunt.co.uk/publications.html

Page 34: Data science - a commercial perspective › RSS_Data_science_a... · 2 Data science 3 Skills needed ‘Softer’ skills Statistical skills 4 Concluding thoughts 5 References. 3 Outline

16

The skills needed are . . .

1 Communication- Influencing- Language- Brevity

2 Common sense3 Time management4 Ability to work with non-technical colleagues5 Statistics6 Modelling

- Exploratory / graphical data analysis / presentation- Critical assessment of models / methods- Not just statistical assessment- ‘Commercial utility’ - simpler is often better

7 Coding

http://www.gordonblunt.co.uk/publications.html

Page 35: Data science - a commercial perspective › RSS_Data_science_a... · 2 Data science 3 Skills needed ‘Softer’ skills Statistical skills 4 Concluding thoughts 5 References. 3 Outline

16

The skills needed are . . .

1 Communication- Influencing- Language- Brevity

2 Common sense3 Time management4 Ability to work with non-technical colleagues5 Statistics6 Modelling

- Exploratory / graphical data analysis / presentation- Critical assessment of models / methods- Not just statistical assessment- ‘Commercial utility’ - simpler is often better

7 Coding

http://www.gordonblunt.co.uk/publications.html

Page 36: Data science - a commercial perspective › RSS_Data_science_a... · 2 Data science 3 Skills needed ‘Softer’ skills Statistical skills 4 Concluding thoughts 5 References. 3 Outline

16

The skills needed are . . .

1 Communication- Influencing- Language- Brevity

2 Common sense3 Time management4 Ability to work with non-technical colleagues5 Statistics6 Modelling

- Exploratory / graphical data analysis / presentation- Critical assessment of models / methods- Not just statistical assessment- ‘Commercial utility’ - simpler is often better

7 Coding

http://www.gordonblunt.co.uk/publications.html

Page 37: Data science - a commercial perspective › RSS_Data_science_a... · 2 Data science 3 Skills needed ‘Softer’ skills Statistical skills 4 Concluding thoughts 5 References. 3 Outline

17

Outline

1 Background

2 Data science

3 Skills needed‘Softer’ skillsStatistical skills

4 Concluding thoughts

5 References

Page 38: Data science - a commercial perspective › RSS_Data_science_a... · 2 Data science 3 Skills needed ‘Softer’ skills Statistical skills 4 Concluding thoughts 5 References. 3 Outline

18

References

Box GEP.Robustness in the strategy of scientific model buildingin Launer and Wilkinson (Eds.) Robustness in Statistics ,Academic Press, 1979.

Breiman L.Statistical Modeling: The Two CulturesStatistical Science, Vol 16 No. 3: 199-231, 2001.

Cleveland WS.Data Science: An Action Plan for Expanding the Technical Areas of the Field ofStatisticsInternational Statistical Review, Vol 69, 21-26, 1982.

Tukey JW.The future of data analysisAnn. Math. Stat., Vol 33 No. 1: 1-67, 1962.

Tukey JW.Exploratory Data Analysis,Addison-Wesley, 1977.

Unwin A.Graphical Data Analysis with R,CRC Press, 2015.