27
Data Science from 3,209 Feet John Chandler University of Montana and Ars Quanta

Data Science from 3,209 Feet John Chandler University of Montana and Ars Quanta

Embed Size (px)

Citation preview

Data Science from3,209 Feet

John ChandlerUniversity of Montana and Ars Quanta

A Data Scientist Toolkit

• A scripting language (Python, C#, Java, Perl)• A statistical computing language (R, SAS, SPSS)• Database languages/environments (MSSQL, Oracle, Postgres, sqlite)• Distributed computing environment (MapReduce, in many flavors)

Fundamentally we are flipping bits, but this isn’t software development.

CRISP-DM, Shearer, 2000

CRISP-DM, Shearer, 2000

CRISP-DM, Shearer, 2000

Tools for data preparation

• A scripting language (Python, C#, Java)• A statistical computing language (R, SAS, SPSS)• Database languages/environments (MSSQL, Oracle, Postgres, sqlite)• Distributed computing environment (MapReduce)

CRISP-DM, Shearer, 2000

CRISP-DM, Shearer, 2000

CRISP-DM, Shearer, 2000

Advice

• What is the simplest thing that could possibly work?• Start small and expand scope.• Use general tools. • Bring uncertainty into the spotlight.• Expect iteration.• Clear-eyed evaluation of not competing on data.