Software Engineering in Data Science

Preview:

Citation preview

Software Engineerin

gfor Data Science

what is data science?

"A data scientist is a statistician who lives in San Francisco"

understanding data-mining processCRISP-DM

Cross Industry Standard Process for Data Mining

Conceived in 1996

Describe 6 high-level analytics process

Current Version CRISP-DM 2.O

agile software developmentSupports incremental development

Use iterative work cadences, known as sprints

Fits with CRISP-DM methodology

Allows feedback loop in development

“software should not be developed like an automobile on an assembly line, in which each piece is added in sequential phases.” - Dr. Winston Royce

language supportSCRIPTING VS COMPILED LANGUAGES

SCRIPTS

Interpreted not compiled

Loosely typed

Can run with errors

Perfect for prototyping & incremental development

Examples: Python, Javascript, PHP, Ruby, R

language supportCOMPILED LANGUAGES

Strict syntax

Compiled language

Use an underlying framework

Examples: C,C++, Java

python and r?Awesome data structures (data frames,vectors,matrices)

Incremental programming

Statistical packages

Web integration (databases,websites, APIs)

Good for quick and dirty work

Can be modularized

Easy to read syntax

BEST PRACTICES

Write rough code (prototyping/proof of concept)

Abstract and separate code into functions

Group functions into library/package

LET’S LOOK AT SOME CODE

Recommended