20
EXPLORING PROCESS OF DOING DATA SCIENCE VIA AN ETHNOGRAPHIC STUDY OF A MEDIA ADVERTISING COMPANY J.SALTZ, I.SHAMSHURIN Click icon to add picture 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA 29 OCTOBER, SANTA CLARA, CA, USA

EXPLORING PROCESS OF DOING DATA SCIENCE VIA AN ETHNOGRAPHIC STUDY OF A MEDIA ADVERTISING COMPANY J.SALTZ, I.SHAMSHURIN 2015 IEEE INTERNATIONAL CONFERENCE

Embed Size (px)

Citation preview

Page 1: EXPLORING PROCESS OF DOING DATA SCIENCE VIA AN ETHNOGRAPHIC STUDY OF A MEDIA ADVERTISING COMPANY J.SALTZ, I.SHAMSHURIN 2015 IEEE INTERNATIONAL CONFERENCE

EXPLORING PROCESS OF DOING DATA SCIENCE VIA AN ETHNOGRAPHIC STUDY

OF A MEDIA ADVERTISING COMPANY

J.SALTZ, I.SHAMSHURIN

Click icon to add picture

2015 IEEE INTERNATIONAL CONFERENCE ON B IG DATA 2 9 O C T O B E R , S A N TA C L A RA , C A , U S A

Page 2: EXPLORING PROCESS OF DOING DATA SCIENCE VIA AN ETHNOGRAPHIC STUDY OF A MEDIA ADVERTISING COMPANY J.SALTZ, I.SHAMSHURIN 2015 IEEE INTERNATIONAL CONFERENCE

SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY 2

OUTLINE

Introduction

Related Work

Data Collection

Findings

Observed Issues

Possible Improvements

Page 3: EXPLORING PROCESS OF DOING DATA SCIENCE VIA AN ETHNOGRAPHIC STUDY OF A MEDIA ADVERTISING COMPANY J.SALTZ, I.SHAMSHURIN 2015 IEEE INTERNATIONAL CONFERENCE

SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY 3

INTRODUCTION

Data science teams do not have an explicit data science team-based process methodology:

• What steps should be done first?

• How long each phase of a project should take?

• Which people with what skills should be involved in the project?

Page 4: EXPLORING PROCESS OF DOING DATA SCIENCE VIA AN ETHNOGRAPHIC STUDY OF A MEDIA ADVERTISING COMPANY J.SALTZ, I.SHAMSHURIN 2015 IEEE INTERNATIONAL CONFERENCE

SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY 4

RELATED WORK

• Data is increasingly being viewed as a strategic resource for the organization [Wade, 2004].

• Bid Data can enable new and improved business models that have not been feasible in the past [Tiefenbacher, 2015]

• Lack of focus on the process teams should use to actually do a data science project [Saltz, 2015]

• Teams doing data analysis and data science work in an ad hoc fashion, using trial and error to identify the right tools [Bhardwaj, 2015]

• Data science as a step-by-step process:o Acquisition, information extraction and cleaning, data integration, modeling, analysis, interpretation and

deployment [Jagadish, 2014]o Preparation, Analysis, Reflection and Dissemination [Guo, 2013]

• Understanding of what might be an appropriate data science process methodology is to document case studies of how teams are actually doing data science, especially within a corporate context [Saltz, 2015]

Page 5: EXPLORING PROCESS OF DOING DATA SCIENCE VIA AN ETHNOGRAPHIC STUDY OF A MEDIA ADVERTISING COMPANY J.SALTZ, I.SHAMSHURIN 2015 IEEE INTERNATIONAL CONFERENCE

SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY 5

BACKGROUND AND STAKEHOLDER ANALYSIS• One of the researchers was embedded within the data science team

•A global media advertising software company headquartered in New York City

• The company had a total of 100 people distributed globally

Page 6: EXPLORING PROCESS OF DOING DATA SCIENCE VIA AN ETHNOGRAPHIC STUDY OF A MEDIA ADVERTISING COMPANY J.SALTZ, I.SHAMSHURIN 2015 IEEE INTERNATIONAL CONFERENCE

SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY 6

RESEARCH QUESTIONS

RQ1. What is the current methodology that they follow?

RQ2. What are some possible ways to improve the current methodology, i.e. to make the projects more efficient in time and cost?

Page 7: EXPLORING PROCESS OF DOING DATA SCIENCE VIA AN ETHNOGRAPHIC STUDY OF A MEDIA ADVERTISING COMPANY J.SALTZ, I.SHAMSHURIN 2015 IEEE INTERNATIONAL CONFERENCE

SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY 7

DATA COLLECTION

• Phase I: information was collected prior to one of the researchers being embedded within the data science team

• Phase II: during a 9 week period, one of the researchers participated as part of the data science team, and in addition to collecting data and observing how the team functioned, actually helped the team with various tasks

• Phase III: interview with the VP of Data Science

Page 8: EXPLORING PROCESS OF DOING DATA SCIENCE VIA AN ETHNOGRAPHIC STUDY OF A MEDIA ADVERTISING COMPANY J.SALTZ, I.SHAMSHURIN 2015 IEEE INTERNATIONAL CONFERENCE

SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY 8

DATA SCIENCE TEAM

• 2 Data Scientists, including VP of Data Science

• 3 Data Operations people

• 3 Software Developers

• 1 Data Engineer

The team was divided across multiple locations

Page 9: EXPLORING PROCESS OF DOING DATA SCIENCE VIA AN ETHNOGRAPHIC STUDY OF A MEDIA ADVERTISING COMPANY J.SALTZ, I.SHAMSHURIN 2015 IEEE INTERNATIONAL CONFERENCE

SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY 9

FINDINGS: TYPES OF PROJECTS

• Routine Projectso on a regular basiso more externalo data transformation and pre-processingo performed by data groupo deadlines

• Exploratory Projectso research orientedo no standard methodology is usedo performed by VP of data science and embedded researchero duration of these projects can vary from a week to a yearo no official deadlines

Page 10: EXPLORING PROCESS OF DOING DATA SCIENCE VIA AN ETHNOGRAPHIC STUDY OF A MEDIA ADVERTISING COMPANY J.SALTZ, I.SHAMSHURIN 2015 IEEE INTERNATIONAL CONFERENCE

SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY 10

FINDINGS: ROLES

• Data Scienceo Explores the data and generates insight from the data, including tasks such as data mining and data visualization. This team included the data scientist who was the embedded observer.

• Data Operationso Getting data from data providers, transformation and preparation of the data for analysis (i.e., for use by the data science team)

• Software Developmento Develop software tools to help the data science team perform data analysis

• Data Engineeringo Supports and improves the existing system and participates in some data science projects

Page 11: EXPLORING PROCESS OF DOING DATA SCIENCE VIA AN ETHNOGRAPHIC STUDY OF A MEDIA ADVERTISING COMPANY J.SALTZ, I.SHAMSHURIN 2015 IEEE INTERNATIONAL CONFERENCE

SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY 11

FINDINGS: HIGH LEVEL PROCESS DESCRIPTION

I. Preparation

· Getting Data from Data Provider· Quick Data Validation· Transform Data

· Understand needs of business· Perform background research

Data

Business Context

· What to do and how to do· Modeling· Generating insights from data

II. Analysis

· Communicate resultsIII. Dissemination

Page 12: EXPLORING PROCESS OF DOING DATA SCIENCE VIA AN ETHNOGRAPHIC STUDY OF A MEDIA ADVERTISING COMPANY J.SALTZ, I.SHAMSHURIN 2015 IEEE INTERNATIONAL CONFERENCE

SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY 12

PROCESS FLOW DESCRIPTION: PREPARATION

Page 13: EXPLORING PROCESS OF DOING DATA SCIENCE VIA AN ETHNOGRAPHIC STUDY OF A MEDIA ADVERTISING COMPANY J.SALTZ, I.SHAMSHURIN 2015 IEEE INTERNATIONAL CONFERENCE

SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY 13

PROCESS FLOW DESCRIPTION: ANALYSIS

Page 14: EXPLORING PROCESS OF DOING DATA SCIENCE VIA AN ETHNOGRAPHIC STUDY OF A MEDIA ADVERTISING COMPANY J.SALTZ, I.SHAMSHURIN 2015 IEEE INTERNATIONAL CONFERENCE

SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY 14

PROCESS FLOW DESCRIPTION: DISSEMINATION

Page 15: EXPLORING PROCESS OF DOING DATA SCIENCE VIA AN ETHNOGRAPHIC STUDY OF A MEDIA ADVERTISING COMPANY J.SALTZ, I.SHAMSHURIN 2015 IEEE INTERNATIONAL CONFERENCE

SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY 15

OBSERVED ISSUES / CHALLENGES• No specific deadlines for the whole data science project or for any individual phase of the project

• Project organization and planning

• Whenever the data science team needs to have a task completed, they send a request to the developers, but the developers typically respond the following day

• Developers are involved in several projects at the same time

Page 16: EXPLORING PROCESS OF DOING DATA SCIENCE VIA AN ETHNOGRAPHIC STUDY OF A MEDIA ADVERTISING COMPANY J.SALTZ, I.SHAMSHURIN 2015 IEEE INTERNATIONAL CONFERENCE

SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY 16

POSSIBLE PROCESS IMPROVEMENTS• Documenting the current process

• Better structuring developer interactions

• Imposing deadlines

• Process automation

• Better preparation

Page 17: EXPLORING PROCESS OF DOING DATA SCIENCE VIA AN ETHNOGRAPHIC STUDY OF A MEDIA ADVERTISING COMPANY J.SALTZ, I.SHAMSHURIN 2015 IEEE INTERNATIONAL CONFERENCE

SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY 17

FEEDBACKS ON SUGGESTION

SuggestionMake

sense?Can

Implement?Short or Long

Term?Documenting the current process

4 4 short

Better structuring developers interactions

3 2 long

Imposing deadlines

4 3 long

Process Automation

4 3 long

Better Preparation

3 3 short

Page 18: EXPLORING PROCESS OF DOING DATA SCIENCE VIA AN ETHNOGRAPHIC STUDY OF A MEDIA ADVERTISING COMPANY J.SALTZ, I.SHAMSHURIN 2015 IEEE INTERNATIONAL CONFERENCE

SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY 18

EFFECTIVE PRACTICES OBSERVED• Pre-processing

• Frequent dialog with senior management:

• Engaging Senior Management

• Using a defined SDLC with the software team

Page 19: EXPLORING PROCESS OF DOING DATA SCIENCE VIA AN ETHNOGRAPHIC STUDY OF A MEDIA ADVERTISING COMPANY J.SALTZ, I.SHAMSHURIN 2015 IEEE INTERNATIONAL CONFERENCE

SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY 19

CONCLUSION

• The data science team was not thinking about the process of doing the projects

• Suggestions received positive feedbacks

• Studying additional organizations might be helpful to examine if the suggestions and feedback from this study are related to the current size, organizational structure or domain of the company, and if there are any patterns observed across the organizations doing data science projects

Page 20: EXPLORING PROCESS OF DOING DATA SCIENCE VIA AN ETHNOGRAPHIC STUDY OF A MEDIA ADVERTISING COMPANY J.SALTZ, I.SHAMSHURIN 2015 IEEE INTERNATIONAL CONFERENCE

SCHOOL OF INFORMATION STUDIES | SYRACUSE UNIVERSITY 20

THANK YOU