33
1 © Copyright 2013 EMC Corporation. All rights reserved. Building Data Science Teams David Dietrich Advisory Technical Education Consultant EMC Education Services @imdaviddietrich Predictive Analytics World Boston September 30, 2013

Building Data Science Teams - Meetupfiles.meetup.com/1676436/BuildingDataScienceTeams.pdf · Data Science Team Data Scientist BI Analyst Project Sponsor Project Manager Business User

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Building Data Science Teams - Meetupfiles.meetup.com/1676436/BuildingDataScienceTeams.pdf · Data Science Team Data Scientist BI Analyst Project Sponsor Project Manager Business User

1 © Copyright 2013 EMC Corporation. All rights reserved.

Building Data Science Teams David Dietrich Advisory Technical Education Consultant EMC Education Services @imdaviddietrich Predictive Analytics World Boston September 30, 2013

Page 2: Building Data Science Teams - Meetupfiles.meetup.com/1676436/BuildingDataScienceTeams.pdf · Data Science Team Data Scientist BI Analyst Project Sponsor Project Manager Business User

2 © Copyright 2013 EMC Corporation. All rights reserved.

Quick Profile •  Architect of EMC’s Data Science

curriculum

•  Co-author of 3 courses on Big Data and Data Science (with FOSS)

•  Filed 9 patents on data science, data privacy, and cloud computing

•  Advisor to universities on analytic programs (Babson, Harvard…)

Twitter:

@imdaviddietrich

Page 3: Building Data Science Teams - Meetupfiles.meetup.com/1676436/BuildingDataScienceTeams.pdf · Data Science Team Data Scientist BI Analyst Project Sponsor Project Manager Business User

3 © Copyright 2013 EMC Corporation. All rights reserved.

Today’s Discussion Questions •  When do you need to build a team? How do you know? What are the

needs? What kind of people do you need on the team?

•  Organizational model? Where should the data science team live within an organization ?

•  Is it always necessary to build a team? When to build, buy or partner?

•  Who are the sponsors in large companies?

3

Page 4: Building Data Science Teams - Meetupfiles.meetup.com/1676436/BuildingDataScienceTeams.pdf · Data Science Team Data Scientist BI Analyst Project Sponsor Project Manager Business User

4 © Copyright 2013 EMC Corporation. All rights reserved.

Creating Reports, Dashboards, and Databases… Do You Need A Data Science Team For This?

0

2

4

6

8

10

12

14

16

NetWorker  ICA RP  Sys  Deploy Symm  Config  Mgmt

Vsphere  ICM VNX  US  Impl

Registrant  -­‐ Americas 3 4 12Offered  -­‐ Americas 2 3 1Registrant  -­‐ APJ 2 6Offered  -­‐ APJ 2 1Registrant  -­‐ EMEA 2 4Offered  -­‐ EMEA 1 1Registrant  -­‐ Unprovided 2 1Offered  -­‐ Unprovided 1 1Total  Registrant 3 10 1 16 6Total  Offered 2 7 1 2 1

Key:

-­‐ EMC  NetWorker  Installation,  Configuration  and  Administration

-­‐ RecoverPoint  S ystem  Deployment

-­‐ S ymmetrix  Configuration  Management

-­‐ VMware  vSphere:  Install,  Configure,  Manage  [V5.1]

-­‐ VNX  Unified  S torage  Implementation

Page 5: Building Data Science Teams - Meetupfiles.meetup.com/1676436/BuildingDataScienceTeams.pdf · Data Science Team Data Scientist BI Analyst Project Sponsor Project Manager Business User

5 © Copyright 2013 EMC Corporation. All rights reserved.

How About For This? Creating a Map of the Internet

Page 6: Building Data Science Teams - Meetupfiles.meetup.com/1676436/BuildingDataScienceTeams.pdf · Data Science Team Data Scientist BI Analyst Project Sponsor Project Manager Business User

6 © Copyright 2013 EMC Corporation. All rights reserved.

Example: Output From a Data Science Team Mapping The Spread of Innovation Ideas Using Social Graphs

Page 7: Building Data Science Teams - Meetupfiles.meetup.com/1676436/BuildingDataScienceTeams.pdf · Data Science Team Data Scientist BI Analyst Project Sponsor Project Manager Business User

7 © Copyright 2013 EMC Corporation. All rights reserved.

Case Study Summary: EMC GINA Initial Questions •  Can we find out how information spreads across a worldwide company? •  How can we figure out the main topics discussed in verbal conversations with universities

in different countries? •  Which people are providing thought leadership and influence, and how would we know?

Mini  Case  Study:      EMC  GINA  Project  

Findings •  Advanced analytics identified “hidden innovators” in China

and Ireland – people with a disproportionate influence on ideas at EMC

•  Ireland in particular had a high density of innovation award finalists, due to training and support of general management

Page 8: Building Data Science Teams - Meetupfiles.meetup.com/1676436/BuildingDataScienceTeams.pdf · Data Science Team Data Scientist BI Analyst Project Sponsor Project Manager Business User

8 © Copyright 2013 EMC Corporation. All rights reserved.

Data Science Teams

Page 9: Building Data Science Teams - Meetupfiles.meetup.com/1676436/BuildingDataScienceTeams.pdf · Data Science Team Data Scientist BI Analyst Project Sponsor Project Manager Business User

9 © Copyright 2013 EMC Corporation. All rights reserved.

Framework for Developing Data Science Teams

Data Science Team

Data Scientist

BI Analyst

Project Sponsor

Project Manager

Business User

Data Engineer DBA

Page 10: Building Data Science Teams - Meetupfiles.meetup.com/1676436/BuildingDataScienceTeams.pdf · Data Science Team Data Scientist BI Analyst Project Sponsor Project Manager Business User

10 © Copyright 2013 EMC Corporation. All rights reserved.

Successful Analytic Projects Require Breadth of Roles

Business User Project Sponsor Project Manager Business Intelligence Analyst

Data Engineer Database Administrator (DBA)

Data Scientist

Page 11: Building Data Science Teams - Meetupfiles.meetup.com/1676436/BuildingDataScienceTeams.pdf · Data Science Team Data Scientist BI Analyst Project Sponsor Project Manager Business User

11 © Copyright 2013 EMC Corporation. All rights reserved.

Framework for Developing Data Science Teams

Data Science Team

Data Scientist

BI Analyst

Project Sponsor

Project Manager

Business User

Data Engineer DBA

Data Science Team for GINA Project •  Volunteer Data Scientists, with graduate level backgrounds in machine learning, NLP, and

complex network analysis •  IT dept provided DBA and Data Engineer skills •  Project sponsor (VP in CTO office) provided overall oversight (project mgmt, domain

knowledge, business user perspective, and evangelism) •  Created analytic sandbox for team members to use and analyze data

Mini  Case  Study:      EMC  GINA  Project  

Page 12: Building Data Science Teams - Meetupfiles.meetup.com/1676436/BuildingDataScienceTeams.pdf · Data Science Team Data Scientist BI Analyst Project Sponsor Project Manager Business User

12 © Copyright 2013 EMC Corporation. All rights reserved.

Developing Data Science Capabilities

Page 13: Building Data Science Teams - Meetupfiles.meetup.com/1676436/BuildingDataScienceTeams.pdf · Data Science Team Data Scientist BI Analyst Project Sponsor Project Manager Business User

13 © Copyright 2013 EMC Corporation. All rights reserved.

Framework for Developing Data Science Teams

Data Science Team

Data Scientist

BI Analyst

Project Sponsor

Project Manager

Business User

Data Engineer DBA

Developing Data Science

Capabilities Transforming Creating As-a-Service Crowdsourcing

Page 14: Building Data Science Teams - Meetupfiles.meetup.com/1676436/BuildingDataScienceTeams.pdf · Data Science Team Data Scientist BI Analyst Project Sponsor Project Manager Business User

14 © Copyright 2013 EMC Corporation. All rights reserved.

Four Approaches to Developing Data Science Capabilities

Transforming Creating As a Service Crowdsourcing

Page 15: Building Data Science Teams - Meetupfiles.meetup.com/1676436/BuildingDataScienceTeams.pdf · Data Science Team Data Scientist BI Analyst Project Sponsor Project Manager Business User

15 © Copyright 2013 EMC Corporation. All rights reserved.

Framework for Developing Data Science Teams

Data Science Team

Data Scientist

BI Analyst

Project Sponsor

Project Manager

Business User

Data Engineer DBA

Developing Data Science

Capabilities Transforming Creating As-a-Service Crowdsourcing

Mini  Case  Study:      EMC  GINA  Project  

Developing Data Science Capabilities for GINA Project •  Capabilities model a mix of Crowdsourcing and aaS (“volunteer aaS model”) •  Team formed for a specific project, several Data Scientists working on different aspects

Page 16: Building Data Science Teams - Meetupfiles.meetup.com/1676436/BuildingDataScienceTeams.pdf · Data Science Team Data Scientist BI Analyst Project Sponsor Project Manager Business User

16 © Copyright 2013 EMC Corporation. All rights reserved.

Organizational Model

Page 17: Building Data Science Teams - Meetupfiles.meetup.com/1676436/BuildingDataScienceTeams.pdf · Data Science Team Data Scientist BI Analyst Project Sponsor Project Manager Business User

17 © Copyright 2013 EMC Corporation. All rights reserved.

Framework for Developing Data Science Teams

Data Science Team

Data Scientist

BI Analyst

Project Sponsor

Project Manager

Business User

Data Engineer DBA

Developing Data Science

Capabilities Transforming Creating As-a-Service Crowdsourcing

Organizational Model Centralized Decentralized Hybrid

Page 18: Building Data Science Teams - Meetupfiles.meetup.com/1676436/BuildingDataScienceTeams.pdf · Data Science Team Data Scientist BI Analyst Project Sponsor Project Manager Business User

18 © Copyright 2013 EMC Corporation. All rights reserved.

Organizational Models for Data Science Teams

Regardless Of Which Approach, They All Need Executive Sponsorship To Succeed

Hybrid

Centralized Data Science Team, But Business Units Also Have Data Science

Capabilities

DS  Team  

Decentralized

Each Business Unit Has Its Own Data Science Capabilities

BU   BU   BU   BU  

Centralized

The Data Science Team Functions As A Hub And Spoke Model; Central Provider Of Analytics To Multiple Business Units

DS  Team  BU   BU  

BU   BU  

BU  

Page 19: Building Data Science Teams - Meetupfiles.meetup.com/1676436/BuildingDataScienceTeams.pdf · Data Science Team Data Scientist BI Analyst Project Sponsor Project Manager Business User

19 © Copyright 2013 EMC Corporation. All rights reserved.

Executive Engagement

Page 20: Building Data Science Teams - Meetupfiles.meetup.com/1676436/BuildingDataScienceTeams.pdf · Data Science Team Data Scientist BI Analyst Project Sponsor Project Manager Business User

20 © Copyright 2013 EMC Corporation. All rights reserved.

Framework for Developing Data Science Teams

Executive Engagement Data-driven CEO Chief Data Officer

Data Science Team

Data Scientist

BI Analyst

Project Sponsor

Project Manager

Business User

Data Engineer DBA

Developing Data Science

Capabilities Transforming Creating As-a-Service Crowdsourcing

Organizational Model Centralized Decentralized Hybrid

Page 21: Building Data Science Teams - Meetupfiles.meetup.com/1676436/BuildingDataScienceTeams.pdf · Data Science Team Data Scientist BI Analyst Project Sponsor Project Manager Business User

21 © Copyright 2013 EMC Corporation. All rights reserved.

Analytics Requires Executive Level Engagement

Executive Boardroom

CEO

“Executive Sponsorship Is So Vital To Analytical Competition…” -- Tom Davenport (Competing on Analytics)

Chief Finance Officer Use Time Series Analysis

Over Historical Data to Predict KPIs to Project

Earnings

Chief Security Officer Collect and Mine Log Data Within and Outside of the

Company to Detect Unknown Threats

Chief Operating Officer Mine Customer Opinions

and Competitor Behaviors to Predict Inventory

Demands

Chief Strategy Officer Simulate Outcomes for Acquiring Our Top 3 Competitors

Chief Product Officer Conduct Social Media Analyses to Identify Customer Opinions

Chief Marketing Officer Conduct Behavior Analyses to Predict If Customers Are Going to Churn

Page 22: Building Data Science Teams - Meetupfiles.meetup.com/1676436/BuildingDataScienceTeams.pdf · Data Science Team Data Scientist BI Analyst Project Sponsor Project Manager Business User

22 © Copyright 2013 EMC Corporation. All rights reserved.

Framework for Developing Data Science Teams

Executive Engagement Data-driven CEO Chief Data Officer

Data Science Team

Data Scientist

BI Analyst

Project Sponsor

Project Manager

Business User

Data Engineer DBA

Developing Data Science

Capabilities Transforming Creating As-a-Service Crowdsourcing

Organizational Model Centralized Decentralized Hybrid

Mini  Case  Study:      EMC  GINA  Project  

Executive Engagement for GINA Project •  VP in CTO office evangelized the project, provided support and resources •  Helped to raise awareness of project at executive levels of the company

Page 23: Building Data Science Teams - Meetupfiles.meetup.com/1676436/BuildingDataScienceTeams.pdf · Data Science Team Data Scientist BI Analyst Project Sponsor Project Manager Business User

23 © Copyright 2013 EMC Corporation. All rights reserved.

EMC Courses on Data Science & Big Data Analytics

90 min

1 day

5 days Aspiring Data

Scientists

Business Leaders

Heads of Data Science Teams

Data Science and Big Data Analytics

Data Science and Big Data Analytics for Business Transformation

Introducing Data Science and Big Data Analytics for Business Transformation

New

New

Page 24: Building Data Science Teams - Meetupfiles.meetup.com/1676436/BuildingDataScienceTeams.pdf · Data Science Team Data Scientist BI Analyst Project Sponsor Project Manager Business User

24 © Copyright 2013 EMC Corporation. All rights reserved.

Closing Thoughts….

Now You Know How To Develop Data Science Teams…What Next?

•  Determine How You Would Like To Develop Data Science Capabilities •  Hire People To Fill Out Your Data Science Team •  Consider Which Organizational Model Will Work Best For Your Situation •  Assess How Much Executive Engagement You Have Or Need •  Map Out Potential Projects -- Balance Quick Wins With Longer-term Wins

Page 25: Building Data Science Teams - Meetupfiles.meetup.com/1676436/BuildingDataScienceTeams.pdf · Data Science Team Data Scientist BI Analyst Project Sponsor Project Manager Business User

25 © Copyright 2013 EMC Corporation. All rights reserved.

Questions? Twitter: @imdaviddietrich Additional Resources:

Blogs

•  My Blog on Data Science & Big Data Analytics: http://infocus.emc.com/author/david_dietrich/

•  Blog series on applying Data Analytics Lifecycle to measuring innovation data: http://stevetodd.typepad.com/my_weblog/data-science-and-big-data-curriculum/

Courses

•  5-day course, Data Science & Big Data Analytics (for Data Scientists): http://bit.ly/15pxLLo

•  1-day course, Data Science & Big Data Analytics for Business Transformation (for leaders): http://bit.ly/uINB5U

Page 26: Building Data Science Teams - Meetupfiles.meetup.com/1676436/BuildingDataScienceTeams.pdf · Data Science Team Data Scientist BI Analyst Project Sponsor Project Manager Business User
Page 27: Building Data Science Teams - Meetupfiles.meetup.com/1676436/BuildingDataScienceTeams.pdf · Data Science Team Data Scientist BI Analyst Project Sponsor Project Manager Business User

27 © Copyright 2013 EMC Corporation. All rights reserved.

Executive Engagement: Data-Driven CEO

Key Focus Areas of a Data-driven CEO:

•  Strategic Data Planning

•  Analytic Understanding

•  Technology Awareness

Procter & Gamble Business Sphere

“… If Your Organization Can Arrange It … Have Someone In A Key Operational Role -- Business Unit Head, Chief Operations Officer, Even CEO -- To Be An Enthusiastic Advocate Of Matters Quantitative.”

-- Tom Davenport (HBR Blog Network)

Page 28: Building Data Science Teams - Meetupfiles.meetup.com/1676436/BuildingDataScienceTeams.pdf · Data Science Team Data Scientist BI Analyst Project Sponsor Project Manager Business User

28 © Copyright 2013 EMC Corporation. All rights reserved.

Executive Engagement: Chief Data Officer (CDO)

•  Promote Data-driven Decision Making To Support Company’s Key Initiatives

•  Ensure The Company Collects the Right Data

•  Oversee and Drive Analytics Company-wide

“… It's Time For Corporations To Embrace A New Functional Member Of The C-suite: The Chief Data Officer (CDO).”

-- Anthony Goldbloom and Merav Bloch, Kaggle

25% of organizations will have a Chief Data Officer by 2015.

-- Gartner Blog Network

Executive Boardroom

Executive-level Advisor On Data Analytics

Page 29: Building Data Science Teams - Meetupfiles.meetup.com/1676436/BuildingDataScienceTeams.pdf · Data Science Team Data Scientist BI Analyst Project Sponsor Project Manager Business User

29 © Copyright 2013 EMC Corporation. All rights reserved.

Approaches to Developing Data Science Capabilities: Transforming Teams

•  Industries Requiring Deep Domain Knowledge (Such As Genetics And DNA Sequencing)

•  Established Companies Who Wish To Introduce Data Science Into Their Business

•  Companies Who Wish To Enrich The In-house Skill Sets

Transforming And Realignment With Minimal Change To The Current Organizational Structure

Page 30: Building Data Science Teams - Meetupfiles.meetup.com/1676436/BuildingDataScienceTeams.pdf · Data Science Team Data Scientist BI Analyst Project Sponsor Project Manager Business User

30 © Copyright 2013 EMC Corporation. All rights reserved.

Approaches to Developing Data Science Capabilities: Creating Teams

•  Start-up Companies

•  Companies Who Wish To … –  Increase Their Focus On Data Analytics –  Start New Data Science Projects

•  Companies Where Data Is The Product

•  Deep Domain Knowledge Is Less Critical For The Analytics  

Developing A New Team From Scratch

Page 31: Building Data Science Teams - Meetupfiles.meetup.com/1676436/BuildingDataScienceTeams.pdf · Data Science Team Data Scientist BI Analyst Project Sponsor Project Manager Business User

31 © Copyright 2013 EMC Corporation. All rights reserved.

Approaches to Developing Data Science Capabilities: Data Science as a Service

•  When To Engage DSaaS Providers –  Prefer Not To Change Existing Organizational Structure –  When Creating Or Transforming Are Not Viable Options

•  Consider Service-level Agreements (SLAs) When Determining Whether To Engage Internal Resources Or External Providers

Engaging Data Science as a Service (DSaaS)

Page 32: Building Data Science Teams - Meetupfiles.meetup.com/1676436/BuildingDataScienceTeams.pdf · Data Science Team Data Scientist BI Analyst Project Sponsor Project Manager Business User

32 © Copyright 2013 EMC Corporation. All rights reserved.

Approaches to Developing Data Science Capabilities: Crowdsourcing Data Science

 •  When To Crowdsource

–  The Problem Is “Open” In Nature –  Willing To Accept Opinions From Distributed And Diverse Groups Of

People –  There’s A Back-up Plan In Case Of “Crowd Failures”

•  Examples: Wikipedia, Netflix’s $1,000,000 Prize

Outsource Data Science Project To Distributed Groups Of People

Page 33: Building Data Science Teams - Meetupfiles.meetup.com/1676436/BuildingDataScienceTeams.pdf · Data Science Team Data Scientist BI Analyst Project Sponsor Project Manager Business User

33 © Copyright 2013 EMC Corporation. All rights reserved.

Approaches to Developing Data Science Capabilities: Crowdsourcing Data Science (Cont’d)

 •  Different Crowdsourcing Models

–  Wisdom Of Crowds –  Swarm Creativity (Collective Intelligence)

•  Crowdsourcing Platforms –  Kaggle.com, Innocentive.com –  Amazon Mechanical Turk

•  Crowd Failures: When The Turnout Of Crowdsourcing Is Unsatisfactory

Outsource Data Science Projects To Distributed Groups Of People