Data Science Project Lifecycle

Preview:

Citation preview

Data Science Project Lifecycle

Jason Geng @Data Application Lab

Miya Du @Data Science Association

Business Requirement

Data Acquisition

Data Preparation

Hypothesis & Modeling

Evaluation & Interpretation

Deployment

Operations

Optimization

Business Requirements

u Data scientists need to work with business people and those with expertise in understanding the data, understanding the business

u Specify the business requirements

u For instance, the healthcare data

e.g. ‘DISCWT’:‘This the discharge-level weight on the HCUP nationwide data to

produce national estimates’

Understand the data:

Understand the Business:

Goal:Predict Readmission Rate

Database:

Healthcare:Readmissions Database

Modeling

Data Collection

u Data from product line

u Purchase third party data

u Social media (Facebook, LinkedIn)

u Web crawling

u Open source data (Opendata, U.S. Census Data)

Challenge

Data Storage

Data Management

Legacy data

OLTP Web Log

Web Crawler

Open Source

Third Party Data

Social Media Data

XML

CSV

LOG

SQL

Product Line

Business Intelligence

Data Science App

Data Preparation (Data Wrangling)

u Cleaning data (semantic errors, missing entries, or inconsistent formatting)

u Challenge: data integration

u 80% time in project workflow

Data Source A

Data Source B

Data Source B

ETLData

Warehouse

Feature Engineering

Select or creating features

Research feature

relevance

Experiment and

validation

Change the feature set

Go back to feature

selection step

Modeling

Reference Source: http://scikit-learn.org/stable/tutorial/machine_learning_map/

Deploy to Product Line

Thank you!

https://www.DataAppLab.com

Feb 2017PPT: Xiaolu Zhao @ Feb 16, 2017

Recommended