Big Data Meets Learning Science: Keynote by Al Essa

Preview:

Citation preview

Big Data Meets Learning Science

Apache Spark Summit East 2017

Alfred EssaVP, Research and Data ScienceMcGraw-Hill Education@malpaso

Our Journey from Print to Digital

2 McGraw-Hill Learning Science

3 Spark, DataBricks

1 Innovation Pipeline

Speed of innovation, notdata, is the differentiator.

Technology Time to Market

Spark Factor

People Process

Apache Spark

DataBricks

Innovation Pipeline

Research

ProductValidation

ProductDevelopment

Databricks underpins our innovation pipeline and workflow.

2 McGraw-Hill Learning Science

From Print to Digital: 128-year Journey

K-12, Higher Ed & Professionalbusinesses

~4,800employees

Parking Lot

Adaptive Platform Leverages MHE Reach and Scale

How do we learn and how can we learn better?.

Introduction of SmartBook

May 2013 1,500+ adaptive products available

Now

Learners who have used MHE Adaptive

~5,500,000 ~10,000,000,000Student interactions

Authors trained to use MHE Adaptive~4,000

Parking Lot

Research Phase

Research Step: Build Models and Algorithms

Model Iteration: Ship “Data Instrumented” product to improve and tune models

Product Focus: Iterate prototypes with business and customers

.Research Question: How do we learn and how can we learn better?

Stacked Algorithm

Learning Tool for Optimizing Acquisition and Recall

2 31

Learning Science Principles

Effortful Recall

Spaced Practice

Interleaving

CognitiveScienceModel

Mobile App

3 Spark, DataBricks

The Problem

Students drop out or fail their course

1

2 At-risk students can be difficult to identify by instructors

Identify at-risk studentspre-emptively

The Solution

A classifier to predict abandonment

Jacqueline FeildDataScientist

Nicholas LewkowDataScientist

Solution: A Classifier to Predict Abandonment

F1 F2 F3 F4 F5 F61F1 F2 F3 F4 F5 F61

F1 F2 F3 F4 F5 F60F1 F2 F3 F4 F5 F61

F1F2F3F4F5F6

Classificationalgorithm

0

• Logistic Regression usedforinitialclassification algorithm

• Simplealgorithm tointerpret• Providesprobability estimates

instead ofhardclassification label• Allowsforsimple interpretation of

featureimportance

• Oneclassifierworksforalldisciplines

Parking LotCompanies that we have met with and completed Stage One, but there is no immediate partnership opportunity

Parallel Pipeline for Creating Classifier

How do we learn and how can we learn better?.

Models and Algorithms

Ship “Data Instrumented” Product

Iterate Prototypes with Customers

Parking LotCompanies that we have met with and completed Stage One, but there is no immediate partnership opportunity

Spark Transformation

How do we learn and how can we learn better?.

Models and Algorithms

Ship “Data Instrumented” Product

Iterate Prototypes with Customers

Parking LotCompanies that we have met with and completed Stage One, but there is no immediate partnership opportunity

Speedup with Spark

How do we learn and how can we learn better?.

Models and Algorithms

Ship “Data Instrumented” Product

Iterate Prototypes with Customers

Sn:Speedupfromn cores

t1:Timetorunon1core

tn:Timetorunonncores

Parking LotCompanies that we have met with and completed Stage One, but there is no immediate partnership opportunity

Evaluate Model Accuracy

How do we learn and how can we learn better?.

Models and Algorithms

Iterate Prototypes with Customers

• Use area underthereceiver operatingcharacteristiccurve(AUC-ROC) asanothermeasure ofmodelaccuracy

• 0.9- 1.0=excellent

• 0.8- 0.9=good

• 0.7- 0.8=fair

• 0.6- 0.7=poor

• 0.5- 0.6=fail

• LookathowtheAUC-ROC foramodelchangesthroughoutthesemester

22

Evaluate Intervention Window

InterventionWindow:

Howmuch timeinadvance canweprovideforanintervention tooccur prior toabandonment?

Conclusions

Technologyisimportant,butbuildanagileinnovationworkflowwithDatabricks.

Recommended