54
Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University

Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

  • View
    223

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

Data mining with DataShop

Ken Koedinger CMU Director of PSLC

Professor of Human-Computer Interaction & Psychology

Carnegie Mellon University

Ryan S.J.d. BakerPSLC/HCII

Carnegie Mellon University

Page 2: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

Overview

Motivation for educational data mining DataShop Learning curves to improve cognitive models Past project example Conclusion

Next

Page 3: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

What is educational data mining?

“The area of scientific inquiry centered around the development of methods for making discoveries within the unique kinds of data that come from educational settings, and using those methods to better understand students and the settings which they learn in.” (Baker, under review)

Page 4: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

What is educational data mining?

More informally: using “large” data sets to answer educational and psychological questions What “large” means is always changing

Developing methods or algorithms to aid in discovery

Page 5: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

What is educational data mining?

One popular data source is “instrumented” computer tutors Fine grained, longitudinal, often across contexts

Other data sources Records of online courses (e.g. WebCAT) District or university-level student records

Example: www.icpsr.umich.edu/IAED

Page 6: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

Educational Data Mining is a hot topic!

2008: First International Conference on Educational Data Mining

2008: Launch of Journal of Educational Data Mining

2009: Second International Conference on Educational Data Mining Submissions due in March 2009

www.educationaldatamining.org

Page 7: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

Data Mining Questions & Methods How can we reliably model student knowledge

or achievement? Bayesian Knowledge Tracing

Simple type of “Bayes Net”, getting less simple all the time

Item Response Theory (IRT) Basis for standardized tests, SAT, GRE, TIMSS… Version of “logistic regression” Many variations & generalizations …

See slides of Brian Junker’s EDM08 invited talk

Page 8: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

Data Mining Questions & Methods What’s the nature of knowledge students are

learning? How can we discover cognitive models of

student learning? Learning Factors Analysis (LFA)

Extends IRT to account for learning Search algorithm: Discover cognitive

model(s) that capture how student learning transfers over tasks over time

Rule space, knowledge space, …

Page 9: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

Data Mining Questions & Methods How can we model students, beyond just what they

know? Models of

Choices: Metacognitive & Motivational Help-seeking Gaming the System Off-Task Behavior Self-explanation

Affect Involves prediction methods such as classification,

regression (not just linear regression)

Page 10: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

Data Mining Questions & Methods What features of a tutor lead to the most

learning? Learning Decomposition

Explores different rates of learning due to different forms of pedagogical support

Close relative of Learning Factors Analysis

Page 11: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

Data Mining Questions & Methods How to extract reliable inferences about

causal mechanisms from correlations in data? Causal modeling using Tetrad

Page 12: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

Data Mining Questions & Methods And one generally useful tool for figuring out what’s

going on, in any of these cases: Exploratory data analysis

Summary & visualization tools in DataShop Tools in Excel Clustering algorithms Visualization packages

Page 13: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

Overview

Motivation for educational data mining DataShop Learning curves to improve cognitive models Past project example Conclusion

Next

Page 14: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

Find DataShop at learnlab.org/datashop

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 15: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

Video Intro of DataShop …

View here:

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 16: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

Public datasets that you can view only.Public datasets that you can view only.

Private datasets you can’t view. Email us and the PI to get access.

Private datasets you can’t view. Email us and the PI to get access.Datasets you can

view or edit. You have to be a project member or PI for the dataset to appear here.

Datasets you can view or edit. You have to be a project member or PI for the dataset to appear here.

DataShop – Dataset Tabs

Page 17: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

Analysis Tools

Dataset Info Performance Profiler Learning Curve Error Report Export Sample Selector

Page 18: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

• Meta data for given dataset

• PI’s get ‘edit’ privileges, others must request it

• Meta data for given dataset

• PI’s get ‘edit’ privileges, others must request it

18

Papers and Files storage

Papers and Files storage

Dataset MetricsDataset Metrics

Problem Breakdown table Problem Breakdown table

Dataset Info

Page 19: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

Performance Profiler

Aggregate by• Step• Problem• KC• Dataset Level

Aggregate by• Step• Problem• KC• Dataset Level

View measures of• Error Rate• Assistance Score• Avg # Hints• Avg # Incorrect• Residual Error Rate

View measures of• Error Rate• Assistance Score• Avg # Hints• Avg # Incorrect• Residual Error Rate

Multipurpose tool to help identify areas that are too hard or easy

Multipurpose tool to help identify areas that are too hard or easy

Page 20: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

View by KC or Student, Assistance Score or Error Rate

View by KC or Student, Assistance Score or Error Rate

Time is represented on the x-axis as ‘opportunity’, or the # of times a student (or students) had an opportunity to demonstrate a KC

Time is represented on the x-axis as ‘opportunity’, or the # of times a student (or students) had an opportunity to demonstrate a KC

Visualizes changes in student performance over time

Visualizes changes in student performance over time

Learning Curve

Page 21: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

• Provides a breakdown of problem information (by step) for fine-grained analysis of problem-solving behavior

• Attempts are categorized by student

• Provides a breakdown of problem information (by step) for fine-grained analysis of problem-solving behavior

• Attempts are categorized by student

View by Problem or KCView by Problem or KC

Error Report

Page 22: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

Export• Two types of export available

• By Transaction• By Step

• Anonymous, tab-delimited file• Easy to import into Excel!

You can also export the Problem Breakdown table and LFA values!

You can also export the Problem Breakdown table and LFA values!

Page 23: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

Sample Selector

Filter by • Condition• Dataset Level• Problem• School• Student• Tutor Transaction

Filter by • Condition• Dataset Level• Problem• School• Student• Tutor Transaction

Easily create a sample/filter to view a smaller subset of data

Easily create a sample/filter to view a smaller subset of data

Shared (only owner can edit) and private samples

Shared (only owner can edit) and private samples

Page 24: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

Help/Documentation

• Extensive documentation with examples• Contextual by tool/report• http://learnlab.web.cmu.edu/datashop/help

• Extensive documentation with examples• Contextual by tool/report• http://learnlab.web.cmu.edu/datashop/help

Glossary of common terms, tied in with PSLC Theory wiki

Glossary of common terms, tied in with PSLC Theory wiki

Page 25: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

New Features

Manage Knowledge Component models Create, Modify & Delete KC models within

DataShop Addition of Latency Curves to Learning Curve

Reporting Time to Correct Assistance Time

Problem Rollup & Export Enhanced Contextual Help

Page 26: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

Overview

Motivation for educational data mining DataShop Learning curves to improve cognitive models Past project example Conclusion

Next

Page 27: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

Cognitive Modeling Challenge

Premise: High quality instructional design requires a high quality cognitive model of student thinking

Problem: Creating such a Cognitive Model is hard to get right Hard to program, but more importantly … A high quality cognitive model requires a deep

understanding of student thinking Cognitive models created by intuition are often wrong

(e.g., Koedinger & Nathan, 2004)

Page 28: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

Significance of improving a cognitive model

A better cognitive model means better: Assessment Instructional feedback & hints (model tracing) Activity selection & pacing (knowledge tracing)

Better cognitive models advance basic cognitive science

Page 29: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

Using student data to build better cognitive models

Cognitive Task Analysis methods Think alouds, Difficulty Factors Assessment

General lecture Tuesday Peer collaboration dialog analysis

TagHelper track Data mining of student interactions with on-line

tutors DataShop track

Page 30: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

Knowledge components Knowledge components are the “are the “germ theory” germ theory” of of transfertransfer

Germs are hidden elements that carry disease from one agent to another

Knowledge components are hidden elements that carry learning experiences from one situation to another -- they account for transfer

Page 31: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

DataShop Supports Theory Integration Makes micro theory concrete Knowledge decomposability hypothesis

Acquisition of academic competencies can be decomposed into units, called knowledge components, that yield predictions about student task performance & the transfer of learning.

Not obviously true “learning, cognition, knowing, and context are

irreducibly co-constituted and cannot be treated as isolated entities or processes” (Barab & Squire, 2004)

Page 32: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

Learning curves show performance changes over time

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Learning curves: Student data Statistical model

fit (blue line) Based on micro level

analysis: learning event

opportunities Averaged across

knowledge components

Page 33: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Not a smooth learning curve -> this knowledge component model is wrong. Does not capture genuine student difficulties.

Page 34: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

This more specific knowledge component (KC) model (2 KCs) is also wrong -- still no smooth drop in error rate.

Page 35: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Ah! Now we get smoother learning curve. A more specific decomposition (12 KCs) better tracks nature of student difficulties & transfer from one problem situation to another

(Rise near end due to fewer observations biased toward poorer students)

Page 36: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Summary: KC model as “germ theory”

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Without decomposition, using just a single “Geometry” KC,

no smooth learning curve.

But with decomposition, 12 KCs for area concepts,

a smooth learning curve.

Upshot: A decomposed KC model fits learning & transfer data better than a “faculty theory” of mind

Page 37: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

Overview

Motivation for educational data mining DataShop Learning curves to improve cognitive models Past project example Conclusion

Next

Page 38: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

Past Project Example

Rafferty (Stanford) & Yudelson (Pitt) Analyzed a data set from Geometry Applied Learning Factors Analysis (LFA) Driving questions:

Are students learning at the same rate as assumed in prior LFA models?

Do we need different cognitive models (KC models) to account for low-achieving vs. high-achieving students?

Page 39: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

A Statistical Model for Learning Curves

Predicts whether student is correct depending on knowledge & practice Additive Factor Model (Draney, et al. 1995, Cen, Koedinger, Junker, 2006)

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Learning rate is different for different skills, but not for different students

Page 40: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

Low-Start High-Learn (LSHL) group has a faster learning rate than other groups of students

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 41: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

Rafferty & Yudelson Results 2 Is it “faster” learning or “different” learning?

Fit with a more compact model is better for low start high learn

Students with an apparent faster learning rate are learning a more “compact”, general and transferable domain model

Resulted in best Young Researcher Track paper at AIED07

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 42: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

Overview

Motivation for educational data mining DataShop Learning curves to improve cognitive models Past project example Conclusion

Next

Page 43: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

Lots of interesting questions to be addressed with Ed Data Mining!! Assessment questions

Can on-line embedded assessment replace standardized tests? Can assessment be accurate if students are learning during test?

Learning theory questions What are the “elements of transfer” in human learning? Is learning rate driven by student variability or content variability? Can conceptual change be tracked & better understood?

Instructional questions What instructional moves yield the greatest increases in learning? Can we replace ANOVA with learning curve comparison to better

evaluate learning experiments? Metacogniton & motivation questions

Can student affect & motivation be detected in on-line click stream data?

Can student metacognitive & self-regulated learning strategies be detected in on-line click stream data?

Page 44: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

Data Mining-Data Shop Offerings

Data Mining Track:Tues 9:15 Using DataShop for Exploratory Data AnalysisTues 1:30 Learning from learning curves

Item Response Theory Learning Factors Analysis

Wed 9:30 Discovery with Models

General lecture:Tues 3:30 Educational Data Mining

Bayesian models of knowledge tracingCausal models with Tetrad

Page 45: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

Questions?

Page 46: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

Extra slides …

Page 47: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

Sample tutor interactions (from 1997 version) that generated Geometry Area data set used in example of learning curves …

Page 48: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

TWO_CIRCLES_IN_SQUARE problem: Initial screen

Page 49: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

TWO_CIRCLES_IN_SQUARE problem: An error a few steps later

Page 50: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

TWO_CIRCLES_IN_SQUARE problem: Student follows hint & completes prob

Page 51: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

Learning curve constrast in Physics dataset …

Page 52: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Not a smooth learning curve -> this knowledge component model is wrong. Does not capture genuine student difficulties.

Page 53: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

More detailed cognitive model yields smoother learning curve. Better tracks nature of student difficulties & transfer

(Few observations after 10 opportunities yields noisy data)

Page 54: Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Best BIC (parsimonious fit) for Default (original) KC model

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Better than simpler Single-KC model

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

And better than more complex Unique-step (IRT) model