62
Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems John Stamper Pittsburgh Science of Learning Center Human-Computer Interaction Institute Carnegie Mellon University 4/8/2013

Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

  • Upload
    ady

  • View
    56

  • Download
    0

Embed Size (px)

DESCRIPTION

Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems. John Stamper Pittsburgh Science of Learning Center Human-Computer Interaction Institute Carnegie Mellon University 4/8/2013. The Classroom of the Future. - PowerPoint PPT Presentation

Citation preview

Page 1: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

John StamperPittsburgh Science of Learning CenterHuman-Computer Interaction InstituteCarnegie Mellon University

4/8/2013

Page 2: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

The Classroom of the Future

Which picture represents the “Classroom of the Future”?

2

Page 3: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

3

The Classroom of the Future

The answer is both!Depends of how much money you have...

… but maybe not what you think…

Page 4: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

4

The Classroom of the FutureRich vs. Poor– Poor kids will be forced to rely on “cheap” technology– Rich kids will have access to “expensive” teachers

We are seeing this today!– Waldorf school in Silicon Valley – no technology– NGLC Wave III Grants– MOOCs – Growth of adaptive technology companies– Online instruction– … and more…

Page 5: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

5

What does this mean?

My view is that we cannot stop this, I believe we must accept that economics will force this route.

We should focus on improving learning technology• New ways to improve teacher-student access• Add more adaptive features to learning software

Adaptive learning, at scale, using data!

Page 6: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

6

Educational Data Mining

• “Educational Data Mining is an emerging discipline, concerned with developing methods for exploring the unique types of data that come from educational settings, and using those methods to better understand students, and the settings which they learn in.” – www.educationaldatamining.org

Page 7: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

7

Types of EDM methods(Baker & Yacef, 2009)

• Prediction– Classification– Regression– Density estimation

• Clustering• Relationship mining

– Association rule mining– Correlation mining– Sequential pattern mining– Causal data mining

• Distillation of data for human judgment• Discovery with models

Page 8: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

Emerging Communities

• Society for Learning Analytics Research– First conference: LAK2011

• International Educational Data Mining Society– First conference: EDM2008– Publishing JEDM since 2009

• Plus an emerging number of great people working in this area who are (not yet) closely affiliated with either community

Page 9: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

Emerging Communities

• Joint goal of exploring the “big data” now available on learners and learning

• To promote– New scientific discoveries & to advance learning sciences– Better assessment of learners along multiple dimensions

• Social, cognitive, emotional, meta-cognitive, etc.• Individual, group, institutional, etc.

– Better real-time support for learners

Page 10: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

10

EDM Methods to discuss

• Prediction – understand what the student knows

• Discovery with models – improve understanding of the structure of knowledge

Page 11: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

11

LearnLabPittsburgh Science of Learning Center (PSLC)• Created to bridge the Chasm between science &

practice– Low success rate (<10%) of randomized field trials

• LearnLab = a socio-technical bridge between lab psychology & schools– E-science of learning & education – Social processes for research-practice engagement

• Purpose: Leverage cognitive theory and computational modeling to identify the conditions that cause robust student learning

Page 12: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

Chemistry Virtual Lab

Algebra Cognitive Tutor

Ed tech + wide use = Research in practice

=

LearnLab: Data-driven improvement infrastructure

• 2004-14, ~$50 million• Tech enhanced courses,

assessment, & research• School cooperation• In vivo experiments

+

English Grammar Tutor

Educational Games

Page 13: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

Interaction data is surprisingly revealing

• Accurate assessment during learning

• Detect student work ethic, engagement …

• Discover better models of what is hard to learn

R = .82

Online interactions => state tests

Learning Curve Analysis

Flat curve => improvement opportunity

Page 14: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

• Central Repository– Secure place to store & access research data– Supports various kinds of research

• Primary analysis of study data• Exploratory analysis of course data• Secondary analysis of any data set

• Analysis & Reporting Tools– Focus on student-tutor interaction data– Data Export

• Tab delimited tables you can open with your favorite spreadsheet program or statistical package

• Web services for direct access

DataShop

1414

Page 15: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

Repository

• Allows for full data management• Controlled access for collaboration• File attachments• Paper attachments• Great for secondary analyses

How big is DataShop?

15

Page 16: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

How big is DataShop?Domain Files Papers Datasets Student Actions Students Student Hours

Language 64 11 78 6,237,523 6,499 6,877 Math 222 53 189 75,754,530 37,218 173,175Science 92 19 93 13,849,756 16,939 45,465Other 18 12 50 8,604,016 13,018 31,111

Total396 95 410 104,445,825 73,674 256,630

As of April 2013

16

Page 17: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

What kinds of data?• By domain based on studies from the Learn Labs

• Data from intelligent tutors

• Data from online instruction

• Data from games

The data is fine grained at a transaction level!

17

Page 18: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

Web Application

Page 19: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

• Explore data through the DataShop tools• Where is DataShop?

– http://pslcdatashop.org– Linked from DataShop homepage and learnlab.org

• http://pslcdatashop.web.cmu.edu/about/• http://learnlab.org/technologies/datashop/index.php

Getting to DataShop

1919

Page 20: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

• KC: Knowledge component– also known as a skill/concept/fact– a piece of information that can be used to

accomplish tasks– tagged at the step level

• KC Model:– also known as a cognitive model or skill model– a mapping between problem steps and knowledge

components

DataShop Terminology

20

Page 21: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

Getting the KC Model Right!

The KC model drives instruction in adaptive learning– Problem and topic sequence– Instructional messages– Tracking student knowledge

21

Page 22: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

What makes a good KC Model?

• A correct expert model is one that is consistent with student behavior.

• Predicts task difficulty • Predicts transfer between instruction and test

The model should fit the data!

22

Page 23: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

Good KC Model => Good Learning Curve

• An empirical basis for determining when a cognitive model is good

• Accurate predictions of student task performance & learning transfer– Repeated practice on tasks involving the same skill

should reduce the error rate on those tasks=> A declining learning curve should emerge

23

Page 24: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

A Good Learning Curve

24

Page 25: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

How do we make KC Models?

25

Page 26: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

Traditionally CTA has been used

But Cognitive Task Analysis has some issues…– Extremely human driven – It is highly subjective– Leading to differing results from different analysts

And these human discovered models are usually wrong!

26

Page 27: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

If Human centered CTA is not the answer

How should these models be designed?

They shouldn’t!

The models should be discovered not designed!

27

Page 28: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

Solution– We have lots of log data from tutors and other systems

– We can harness this data to validate and improve existing student models

28

Page 29: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

29

Human-Machine Student Model DiscoveryDataShop provides easy interface to add and modify

KC models and ranks the models using AFM

29

Page 30: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

Human-Machine Student Model Discovery

3 strategies for discovering improvements to the student model

– Smooth learning curves

– No apparent learning

– Problems with unexpected error rates

30

Page 31: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

A good cognitive model produces a learning

curve

Without decomposition, using just a single “Geometry” skill,

Is this the correct or “best” cognitive model?

no smooth learning curve.

a smooth learning curve.

But with decomposition, 12 skills for area,

(Rise in error rate because poorer students get assigned more problems)

Page 32: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

Inspect curves for individual knowledge components (KCs)

Some do not =>Opportunity to improve model!

Many curves show a reasonable decline

32

Page 33: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

No apparent Learning

33

Page 34: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

Problems with Unexpected Error Rates

34

Page 35: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

Inspect problems to hypothesize new KC labels

• Here scaffolding is originally absent, but other problems have fixed scaffolding– They start with columns for square & area

Page 36: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

These strategies suggest an improvement

– Hypothesized there were additional skills involved in some of the compose by addition problems

– A new student model (better BIC value) suggests the splitting the skill.

36

Page 37: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

Redesign based on Discovered Model

Our discovery suggested changes needed to be made to the tutor

– Resequencing – put problems requiring fewer skills first

– Knowledge Tracing – adding new skills– Creating new tasks – new problems– Changing instructional messages, feedback or

hints

37

Page 38: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

Study : Current tutor is control

• Current fielded tutor only uses scaffolded problems

Page 39: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

Study: Treatment

• Scaffolded, given areas, plan-only, & unscaffolded

• Isolate practice on problem decomposition

Page 40: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

Study Results

• Much more efficient & better learning on targeted decomposition skills

Post-test % correct by item type

Control: Original tutor

Treatment: Model-based

redesign

0.7

0.75

0.8

0.85

0.9

0.95

1

CompositionArea

Instructional time (minutes) by step type

Control: Original tutor

Treatment: Model-based

redesign

0

10

20

30 Composition steps Area and other steps

Page 41: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

Design

DeployData

Discover

Translational Research Feedback Loop

Page 42: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

Can a data-driven process be automated & brought to scale?

Yes!

• Combine Cognitive Science, Psychometrics, Machine Learning …

• Collect a rich body of data• Develop new model discovery algorithms,

visualizations, & on-line collaboration support

42

Page 43: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

DataShop’s “leaderboard” ranks discovered cognitive models100s of datasets coming from ed tech in math, science, & language

Some models are machine generated (based on human-generated learning factors)

Some models are human generated

43

Page 44: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

Metrics for model prediction

• AIC & BIC penalize for more parameters, fast & consistent

• 10 fold cross validation• Minimize root mean squared error (RMSE) on

unseen data

44

Page 45: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

Automated search for better models

Learning Factors Analysis (LFA) (Cen, Koedinger, & Junker, 2006) • Method for discovering & evaluating cognitive models• Finds model “Q matrix” that best predicts student learning data• Inputs

Data: Student success on tasks over time Factors hypothesized to explain learning

• Outputs Rank order of most predictive Q matrix Parameter estimates for each

Page 46: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

Simple search process example: modifying Q matrix by input factor to

get new Q’ matrix

• Produces new Q matrix• Two new KCs (Sub-Pos & Sub-Neg) replace old KC (Sub)

• Redo opportunity counts

• Q matrix factor Sub split by factor Neg-result

Page 47: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

OriginalModel

BIC = 4328

4301 4312

4320

43204322

Split by Embed Split by Backward Split by Initial

43134322

4248

50+

4322 43244325

15 expansions later

LFA: Best First Search Process

Cen, H., Koedinger, K., Junker, B. (2006).  Learning Factors Analysis: A general method for cognitive model evaluation and improvement. 8th International Conference on Intelligent Tutoring Systems.

• Search algorithm guided by a heuristic: AIC

• Start with single skill cog model (Q matrix)

Page 48: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

Scientist “crowd”sourcing: Feature input comes “for free”

Scientist generated models

48

Union of all hypothesized KCs in human generated models

Page 49: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

Validating Learning Factors Analysis

• Discovers better cognitive models in 11 of 11 datasets …

Koedinger, McLaughlin, & Stamper (2012). Automated student model improvement. In Proceedings of the Fifth International Conference on Educational Data Mining. [Conference best paper.]

Page 50: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

Data from a variety of educational technologies & domains

50

Numberline Game

Statistics Online Course English Article Tutor

Algebra Cognitive Tutor

Page 51: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

Applying LFA across domains

11 of 11 improvedmodels

9 of 11 equal or greater learning

Variety of domains& technologies

Page 52: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

Can we go even bigger?

52

Page 53: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

Competitions?

Page 54: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

KDD Cup CompetitionKnowledge Discovery and Data Mining (KDD) is the most

prestigious conference in the data mining and machine learning fields

KDD Cup is the premier data mining challenge

2010 KDD Cup called “Educational Data Mining Challenge”

Ran from April 2010 through June 2010

54

Page 55: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

KDD Cup CompetitionCompetition goal is to predict student responses given tutor data

provided by Carnegie Learning

Dataset Students Steps File sizeAlgebra I 2008-2009 3,310 9,426,966 3 GBBridge to Algebra 2008-2009

6,043 20,768,884 5.43 GB

55

Page 56: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

KDD Cup Competition 655 registered participants

130 participants who submitted predictions

3,400 submissions

Page 57: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

KDD Cup Competition Advances in prediction, cognitive modeling, new methods

applied to EDM

Spawned a number of workshops and papers

The datasets are now in the “wild” and showing up in non KDD conferences

New competitions to continue momentum

57

Page 58: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

Marigames.org

• Two stage competition with $100,000 in prizes– $50,000 Game Development– $50,000 Educational Data Mining

• Goal is to go beyond individual datasets• This requires common data formats

58

Page 59: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

Take aways

• The amount of data coming from educational technology is growing exponentially

• Huge potential for EDM to improve educational systems • Optimal instructional design requires discoveries (The

student is not like me)

• These methods require common forms of data for analysis (standards!)

59

Page 60: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

Opportunities

• New Learning Science and Engineering professional masters degree at Carnegie Mellon University

• New concentration in Learning Analytics, MA in Cognitive Studies in Education at Teachers College, Columbia University

• Other programs in the works

60

Page 61: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

Thank you

Special Thanks to:Ken Koedinger, Director LearnLab Ryan Baker, President IEDMSSteve Ritter, Carnegie Learning

61

Page 62: Using Data-Driven Discovery Techniques for the Design and Improvement of Educational Systems

62

http://pslcdatashop.orgQuestions?

[email protected]://dev.stamper.org