2016 NCAIR Analytics: Reflective to Predictive

David OnderDirector of Assessment

Casey Iannone Consultant

Background

2

Background

3Freshmen Only

The Problem

• We are asked for program-level data

• Lack of program-level retention and graduation rates

• Complicated (particularly for undergrads)

• Different programs serve different purposes

• High stakes – program prioritization (AA driven)

• Reports lumped all non-retained together (whether they graduated or stopped out)

4

Reflecting on the past

What We Wanted

• Solid & simple approach (easy to explain and defend)• Fair

• Useful for all types of programs

• Meaningful for decision-making (high- and low-level)• Not overly complicated display• Illuminates

• Overall performance

• Historic trends

• When are students lost

• Something that can be generated yearly w/o too much effort

7

5 Outcomes

• Five possible outcomes for each student that declares a given major

• Retained in program

• Graduated in program

• Retained in different program

• Graduated in different program

• Not retained (stop-out/drop-out)

• Exclusive and exhaustive

8

General Approach

• Based on cohorts:

• A student is placed in a program cohort the 1st time they declare a given program

• Each student in the cohort is flagged as one of the 5 possible outcomes for each ½ year interval (each regular semester)

• At each interval we report where the members of the cohort fall

• Each student will only appear in one cohort for a program (usually)

9

Technical Approach

10

Multiple Iterations

Started with term-level dataThis is helpful for programs, in the

context of program history, but NOT administrators

11

Academic Year

Next combined data into academic years

12

Summary

All students lumped into 3 groups

This is the most summarized data we can (read: are willing to) provide.The bottom line = Bold 3-group number

13

But how does that compare?

This is average program data to use as a comparison

14

Horizontal stacked bar

• Normally we would never do this, but …..

What are people asking:1. How is the program performing?2. How are students performing overall at institution?3. How many are dropping out?

15

1

2

3

We graph 5 Flags too

@ 1 year @ 4 year @ 6 year

Transitions to completers and drop-outs

16

Compare performance over time

Main question

17

But what about next year?

If we study learning as a data science, we can reverse engineer the human brain and tailor learning techniques to maximize the chances of student success. This is the biggest revolution that could happen in education, turning it into a data-driven science, and not such a medieval set of rumors professors tend to carry on.

-Sebastian Thrun

Most data is only

viewed in one

direction.

A Small Window of

Time Exist to Reach

Students.

The Rising Tide

Of Data.

What is Machine Learning?

Machine learning is the science of getting computers to act without being explicitly programmed.

-Stanford University

Supervised Learning

Unsupervised Learning

Co

nti

nu

ou

sSupervised Unsupervised

Dis

cre

te Classification or

Categorization

Regression

Clustering

Dimensionality reduction

Co

nti

nu

ou

sSupervise

d

Unsupervise

d

Dis

cre

te

Classificationo KNNo Treeso Logistic Regressiono Naïve-Bayeso SVM

Regressiono Linearo PolynomialDecision TreesRandom Forests

Association Analysiso Apriorio FP-GrowthHidden Markov Model

Clustering & Dimensionality reductiono SVDo PCAo K-means

Ready to enter the

Lab

First-year Retention Example

Student RetentionBinary response: Retained yes/no

### CREATING MODEL ###train_new <- DT[1:3032,]test_new <- DT[3033:3790,]test_new$fake_retention <- NULL

# Create a new model `my_tree`my_tree <- rpart( fake_retention ~ SAT + HS.GPA + Gender_Factor + Race_Eth,

data = train_new, method = "class",control=rpart.control(cp=0.0001) )

# Visualize your new decision treeprp( my_tree, type = 4, extra = 100, faclen=0, under = TRUE,

compress=FALSE, ycompress=FALSE )

# Make your prediction using `my_tree` and `test_new`my_prediction <- predict( my_tree, test_new, type = "class“ )

# Create a data frame with two columns: ID & Retained Survived contains your predictionsvector_studentID <- test_new$ID

my_solution <- data.frame( ID=vector_studentID, Retained=my_prediction )

http://ipython.org/notebook.html

http://ipython.org/notebook.html

https://www.kaggle.com/c/titanic

https://www.kaggle.com/c/titanic

Brainstorm

Brainstorm

Campus Assessment/Effectiveness Framework

Data Dissemination

Key Stakeholders

Education