67
On Building a Data Science Curriculum November 23nd, 2014

On Building a Data Science Curriculum

Embed Size (px)

DESCRIPTION

Data Science is a comparatively new field and as such it is constantly changing as new techniques, tools, and problems emerge every day. Traditionally education has taken a top down approach where courses are developed on the scale of years and committees approve curricula based on what might be the most theoretically complete approach. This is at odds however with an evolving industry that needs data scientists faster than they can be (traditionally) trained. If we are to sustainably push the field of Data Science forward, we must collectively figure out how to best scale this type of education. At Zipfian I have seen (and felt) first hand what works (and what doesn't) when tools and theory are combined in a classroom environment. This talk will be a narrative about the lessons learned trying to integrate high level theory with practical application, how leveraging the Python ecosystem (numpy, scipy, pandas, scikit-learn, etc.) has made this possible, and what happens when you treat curriculum like product (and the classroom like a team).

Citation preview

Page 1: On Building a Data Science Curriculum

On Building a Data Science CurriculumNovember 23nd, 2014

Page 2: On Building a Data Science Curriculum

Jonathan DinuDirector of Education, Galvanize

[email protected]@clearspandex

Questions? tweet @galvanize

Page 3: On Building a Data Science Curriculum

Formerly

Questions? tweet @galvanize

Page 4: On Building a Data Science Curriculum

Formerly

Questions? tweet @galvanize

Page 5: On Building a Data Science Curriculum

+

Currently

Questions? tweet @galvanize

Page 6: On Building a Data Science Curriculum

Challenge

The Challenge

Questions? tweet @galvanize

Page 7: On Building a Data Science Curriculum

Challenge

Page 8: On Building a Data Science Curriculum

Tools

Framework/Library

Big Data (scalability)

Small Data

Bespoke Code

Cloudera ML

Mahout

MLlib (amplab)H20 (0xdata)

C/C++

MapReduce (Streaming)

MapReduce (Java)

Cascading/Crunch

Pig/Hive

Vowpal Rabbit

GiraphGraphLab

SparkStorm

CRANR

Python

Javascikit-learn

pandas

mlpack

Weka

Numpy

Javascript

Questions? tweet @galvanize

Page 9: On Building a Data Science Curriculum

Obligatory Name Drop

Questions? tweet @galvanize

Acquisition

Parse

Storage

Transform/Explore

Vectorization

Train

Model

Expose

Presentation

requests

BeautifulSoup4

pandas

pymongo

Flask

At Scale Locally

scrapy

Hadoop Streaming (w/ BeautifulSoup4)

mrjob or Mortar (w/ Python UDF)

Snakebite (HDFS)

MLlib (pySpark)

Flask

scikit-learn/NLTK

Page 10: On Building a Data Science Curriculum

Challenge

Questions? tweet @galvanize

Page 11: On Building a Data Science Curriculum

Challenge

Now do that in 8 weeks

Questions? tweet @galvanize

Page 12: On Building a Data Science Curriculum

Challenge

Questions? tweet @galvanize

Page 13: On Building a Data Science Curriculum

Intuition

Iteration 0: Intuition

Questions? tweet @galvanize

Page 14: On Building a Data Science Curriculum

Content

Questions? tweet @galvanize Source: Metacademy

Page 15: On Building a Data Science Curriculum

Bottom Up Approach

Questions? tweet @galvanize

Content

Page 16: On Building a Data Science Curriculum

Content

Source: Coursera

Page 17: On Building a Data Science Curriculum

Content

Source: UC Berkeley Masters

Page 18: On Building a Data Science Curriculum

Not Everybody Learns This Way

Questions? tweet @galvanize

Issues

Page 19: On Building a Data Science Curriculum

Issues

• Not Enough Context

• Not Enough Concept Overlap

• Takes too much Time

• Nothing Happens in a Vacuum

Questions? tweet @galvanize

Page 20: On Building a Data Science Curriculum

Digression

Not Just for Data Science

Questions? tweet @galvanize

(relevant to learning any complex subject)

Page 21: On Building a Data Science Curriculum

Experience

Iteration 1: Experience

Questions? tweet @galvanize

Page 22: On Building a Data Science Curriculum

Theory

Mathematics Statistical Analysis

Mathematics & Statistics

Distributions (Binomial, Poisson,

etc.)

Summary Statistics (Mean, Variance, etc.)

Hypothesis Testing

Bayesian Analysis

Linear Algebra (Matrix Factorization)

Calculus (Integrals,

Derivatives, etc)

Graph Theory

Probability/Combinatorics

Questions? tweet @galvanize

Page 23: On Building a Data Science Curriculum

Questions? tweet @galvanize

Worth the Upfront Investment

Theory

Page 24: On Building a Data Science Curriculum

Technique

Distributed Computing

Supervised (SVM, Random

Forest)

NLP / Information Retrieval

Algorithms & Data Structures

Data Visualization

Data Munging

Machine Learning & Software Engineering

Machine Learning

Software Engineering

Validation, Model Comparison

Unsupervised (K-means, LDA)

Questions? tweet @galvanize

Page 25: On Building a Data Science Curriculum

Questions? tweet @galvanize

Just ask them!

Network

(the students)

Page 26: On Building a Data Science Curriculum

Context is King

Page 27: On Building a Data Science Curriculum

Questions? tweet @galvanize

Network

Page 28: On Building a Data Science Curriculum

Questions? tweet @galvanize

Network

Iris Dataset Classification

Page 29: On Building a Data Science Curriculum

Questions? tweet @galvanize

Network

Iris Dataset Classification

NYT Topic Modeling

Page 30: On Building a Data Science Curriculum

Questions? tweet @galvanize

Network

Iris Dataset Classification

NYT Topic Modeling

Real-time Fraud scoring service

Page 31: On Building a Data Science Curriculum

Questions? tweet @galvanize

Network

Iris Dataset Classification

NYT Topic Modeling

Real-time Fraud scoring service

Personal Capstone Project

Page 32: On Building a Data Science Curriculum

Questions? tweet @galvanize

Network

Iris Dataset Classification

NYT Topic Modeling

Real-time Fraud scoring service

Personal Capstone

“Domesticated Data” Learn the tools/theory

Page 33: On Building a Data Science Curriculum

Questions? tweet @galvanize

Network

Iris Dataset Classification

NYT Topic Modeling

Real-time Fraud scoring service

Personal Capstone

“Domesticated Data” Learn the tools/theory

Learn the application“Wild Data”

Page 34: On Building a Data Science Curriculum

Questions? tweet @galvanize

Network

Iris Dataset Classification

NYT Topic Modeling

Real-time Fraud scoring service

Personal Capstone

“Domesticated Data” Learn the tools/theory

Learn the application“Wild Data”

Simulated Case Study Learn the process

Page 35: On Building a Data Science Curriculum

Questions? tweet @galvanize

Network

Iris Dataset Classification

NYT Topic Modeling

Real-time Fraud scoring service

Personal Capstone

“Domesticated Data” Learn the tools/theory

Learn the application“Wild Data”

Simulated Case Study

Greenfield Project Learn the practice/art

Learn the process

Page 36: On Building a Data Science Curriculum

Theory

Questions? tweet @galvanize

Theory

Application

Synthesis

$$$ PROFIT!!

Page 37: On Building a Data Science Curriculum

Questions? tweet @galvanize

Just ask them!

Network

Page 38: On Building a Data Science Curriculum

Network

Questions? tweet @galvanize

Page 39: On Building a Data Science Curriculum

Questions? tweet @galvanize

Just ask them!(and be flexible)

Network

Page 40: On Building a Data Science Curriculum

Questions? tweet @galvanize

Treat them like customers(because they are)

Network

Page 41: On Building a Data Science Curriculum

Questions? tweet @galvanize

Always Validate!

Network

Page 42: On Building a Data Science Curriculum

Metrics

Iteration 2: Data!

Questions? tweet @galvanize

Page 43: On Building a Data Science Curriculum

Experience

Iteration 2: Data!

Questions? tweet @galvanize

METRICS

METRICS EVERYWHERESaturday, April 9, 2011

Page 44: On Building a Data Science Curriculum

Metrics

Questions? tweet @galvanize

Page 45: On Building a Data Science Curriculum

Questions? tweet @galvanize

• Commits

• Pull Requests

• Passing Tests

• Etc.

Metrics

Page 46: On Building a Data Science Curriculum

Curriculum as Product

Page 47: On Building a Data Science Curriculum

Learning Techniques

Questions? tweet @galvanize

Page 48: On Building a Data Science Curriculum

Questions? tweet @galvanize

Industry Techniques

Source: http://en.wikipedia.org/wiki/Extreme_programming

Page 49: On Building a Data Science Curriculum

Questions? tweet @galvanize

Industry Techniques

Source: http://lostechies.com/scottreynolds/2009/10/07/how-we-do-things-tdd-bdd/

Page 50: On Building a Data Science Curriculum

Questions? tweet @galvanize

Industry Techniques

Code Reviews

Source: http://agile.dzone.com/articles/re-pair-programming

Page 51: On Building a Data Science Curriculum

Our House

@Zipfian(now Galvanize)

Questions? tweet @galvanize

Page 52: On Building a Data Science Curriculum

source: http://www.sebastienmillon.com/Rainbow-Immersion-Therapy-Art-Print-15

Page 53: On Building a Data Science Curriculum

Methodology

Commun

ity Education

Industry

Meetup

Student Groups

Corporate Training

Questions? tweet @galvanize

Page 54: On Building a Data Science Curriculum

Methodology

Questions? tweet @galvanize

• Outcomes focused

• Project-based curriculum using real datasets

• Guest lectures from leaders in the field

• Mock interviews and hiring preparation

• Full instructional staff + personal mentorship

Page 55: On Building a Data Science Curriculum

Employment

Questions? tweet @galvanize Source: http://www.nerdwallet.com/nerdscholar/grad_surveys/highest-employment-rates

University of Massachusetts-Amherst School of Nursing

98%

Georgetown University McDonough School of Business

94%

Michigan State University College of Nursing

92%

Syracuse University School of Architecture

90%

University of Massachusetts-Amherst Isenberg School of Management

90%

Michigan State University School of Hospitality Business

89%

New York University 88%

Boston College Connell School of Nursing

88%

Boston College Carroll School of Management

87%

Case Western Reserve University Frances Payne Bolton School of Nursing

86%

Highest Employment Rates (2012)

1. Princeton University

2. Harvard University

3. Yale University

4. Columbia University

5. Stanford University

6. University of Chicago

7. Duke University

8. MIT

9. University of Pennsylvania

10. California Institue of Technology

U.S. News and World Report Ranking

Page 56: On Building a Data Science Curriculum

Timeline

Questions? tweet @galvanize

STRUCTURED CURRICULUM

HIRING DAY

CAPSTONE PROJECT

GRADUATION

08 10.5 12

INTERVIEWS

Data Science Immersive

Page 57: On Building a Data Science Curriculum

Questions? tweet @galvanize

Industry Student Projects

Page 58: On Building a Data Science Curriculum

Questions? tweet @galvanize

!

• Working knowledge of programming

• Background in a quantitative discipline

• Comfortable with mathematics and statistics

• Child-like curiosity

What We Look For

Our Students

Page 59: On Building a Data Science Curriculum

Our Students

Questions? tweet @galvanize

Educational Background

BS

MS

PhD

0 4 8 12 16

Page 60: On Building a Data Science Curriculum

Questions? tweet @galvanize

Disciplines

Software EngineeringAnalysts

Finance/EconomicsEngineering

PhysicsPhysical Sciences

MathematicsStatistics

AstronomyLinguistics

Professional Poker

0 2 4 6 8

Our Students

Page 61: On Building a Data Science Curriculum

Questions? tweet @galvanize

Data Science Immersive

Masters in Data Science

Data Engineering Immersive

Weekend Workshops

+

Page 62: On Building a Data Science Curriculum

Questions? tweet @galvanize

Immersive

Masters

Page 63: On Building a Data Science Curriculum

Questions? tweet @galvanize

Immersive

Masters

(not to scale)

Page 64: On Building a Data Science Curriculum

Questions? tweet @galvanize

Masters of Science - 1 year (Starts in Spring)

http://www.galvanizeu.com/request-info

Page 65: On Building a Data Science Curriculum

Goals

Questions? tweet @galvanize

!

• Present a guest lecture or share a data story

• Donate datasets and propose projects

• Sponsor a scholarship

• Attend our Hiring Day

Get Involved

Page 66: On Building a Data Science Curriculum

Goals

Questions? tweet @galvanize

!

• Full-time Instructors

• TAs

• Mentor (volunteer)

We’re Hiring!

Page 67: On Building a Data Science Curriculum

Questions?

Questions? tweet @galvanize

Thank You!

Jonathan DinuDirector of Education, Galvanize

[email protected]@clearspandex