Upload
chris-orwa
View
282
Download
0
Embed Size (px)
Citation preview
Software Engineerin
gfor Data Science
what is data science?
"A data scientist is a statistician who lives in San Francisco"
understanding data-mining processCRISP-DM
Cross Industry Standard Process for Data Mining
Conceived in 1996
Describe 6 high-level analytics process
Current Version CRISP-DM 2.O
agile software developmentSupports incremental development
Use iterative work cadences, known as sprints
Fits with CRISP-DM methodology
Allows feedback loop in development
“software should not be developed like an automobile on an assembly line, in which each piece is added in sequential phases.” - Dr. Winston Royce
language supportSCRIPTING VS COMPILED LANGUAGES
SCRIPTS
Interpreted not compiled
Loosely typed
Can run with errors
Perfect for prototyping & incremental development
Examples: Python, Javascript, PHP, Ruby, R
language supportCOMPILED LANGUAGES
Strict syntax
Compiled language
Use an underlying framework
Examples: C,C++, Java
python and r?Awesome data structures (data frames,vectors,matrices)
Incremental programming
Statistical packages
Web integration (databases,websites, APIs)
Good for quick and dirty work
Can be modularized
Easy to read syntax
BEST PRACTICES
Write rough code (prototyping/proof of concept)
Abstract and separate code into functions
Group functions into library/package
LET’S LOOK AT SOME CODE