View
60
Download
0
Category
Preview:
DESCRIPTION
Bayesian Knowledge Tracing and Other Predictive Models in Educational Data Mining. Zachary A. Pardos. PSLC Summer School 2011. Zach Pardos. PLSC Summer School 2011. Outline of Talk. Introduction to Knowledge Tracing History Intuition Model Demo Variations (and other models) - PowerPoint PPT Presentation
Citation preview
Bayesian Knowledge Tracing and Other Predictive Models in Educational Data Mining
Zachary A. Pardos
PSLC Summer School 2011
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
2Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
Outline of Talk
• Introduction to Knowledge Tracing– History– Intuition– Model– Demo– Variations (and other models)– Evaluations (baker work / kdd)
• Random Forests– Description– Evaluations (kdd)
• Time left? – Vote on next topic
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
History• Introduced in 1995 (Corbett & Anderson,
UMUAI)• Basked on ACT-R theory of skill knowledge
(Anderson 1993)• Computations based on a variation of
Bayesian calculations proposed in 1972 (Atkinson)
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
Intuition• Based on the idea that practice on a skill leads
to mastery of that skill • Has four parameters used to describe student
performance• Relies on a KC model• Tracks student knowledge over time
Given a student’s response sequence 1 to n, predict n+1
0 0 0 11 1 ?
For some Skill K:
Chronological response sequence for student Y[ 0 = Incorrect response 1 = Correct response]
1 …. n n+1
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
0 0 0 11 1 1
Track knowledge over time(model of learning)
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
K K K
Q Q Q
P(T) P(T)
Model ParametersP(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip
Nodes representationK = knowledge nodeQ = question node
Node statesK = two state (0 or 1)Q = two state (0 or 1)
P(L0)
P(G)P(S)
Knowledge Tracing Knowledge Tracing (KT) can be represented as a simple HMM
Latent
Observed
Node representationsK = Knowledge nodeQ = Question node
Node statesK = Two state (0 or 1)Q = Two state (0 or 1)
UMAP 2011 7
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
K K K
Q Q Q
P(T) P(T)
Model ParametersP(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip
Nodes representationK = knowledge nodeQ = question node
Node statesK = two state (0 or 1)Q = two state (0 or 1)
P(L0)
P(G)P(S)
Knowledge Tracing Four parameters of the KT model:
P(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip
UMAP 2011
P(L0) P(T) P(T)
P(G) P(G) P(G) P(S)
Probability of forgetting assumed to be zero (fixed)
8
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
Formulas for inference and prediction
• Derivation (Reye, JAIED 2004):
• Formulas use Bayes Theorem to make inferences about latent variable
If (1)
(2) (3)
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
0 0 11 1
Model Training Step - Values of parameters P(T), P(G), P(S) & P(L0 ) used to predict student responses• Ad-hoc values could be used but will likely not be the best fitting• Goal: find a set of values for the parameters that minimizes prediction error
1 1 1 10 10 0 0 01 0
Student AStudent BStudent C
0
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
Model Training:
Model Tracing Step – Skill: Subtraction
K K K
Q Q Q
P(T) P(T)
Model ParametersP(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip
Nodes representationK = knowledge nodeQ = question node
Node statesK = two state (0 or 1)Q = two state (0 or 1)
P(L0)
P(G)P(S)
Knowledge Tracing
K K K
Q Q Q
P(T) P(T)
Model ParametersP(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip
Nodes representationK = knowledge nodeQ = question node
Node statesK = two state (0 or 1)Q = two state (0 or 1)
P(L0)
P(G)P(S)
Knowledge Tracing
K K K
Q Q Q
P(T) P(T)
Model ParametersP(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip
Nodes representationK = knowledge nodeQ = question node
Node statesK = two state (0 or 1)Q = two state (0 or 1)
P(L0)
P(G)P(S)
Knowledge Tracing
0 1 1
Student’s last three responses to Subtraction questions (in the Unit)
Test set questions
Latent (knowledge)
Observable(responses)
10% 45% 75% 79% 83%
71% 74%
P(K)
P(Q)
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
Model Prediction:
Influence of parameter values
P(L0): 0.50 P(T): 0.20 P(G): 0.14 P(S): 0.09
Student reached 95% probability of knowledgeAfter 4th opportunity
Estimate of knowledge for student with response sequence: 0 1 1 1 1 1 1 1 1 1
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
Estimate of knowledge for student with response sequence: 0 1 1 1 1 1 1 1 1 1
P(L0): 0.50 P(T): 0.20 P(G): 0.14 P(S): 0.09P(L0): 0.50 P(T): 0.20 P(G): 0.64 P(S): 0.03
Student reached 95% probability of knowledgeAfter 8th opportunity
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
Influence of parameter values
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
( Demo )
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
Variations on Knowledge Tracing(and other models)
Prior Individualization Approach
K K K
Q Q Q
P(T) P(T)
Model ParametersP(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip
Nodes representationK = knowledge nodeQ = question node
Node statesK = two state (0 or 1)Q = two state (0 or 1)
P(L0)
P(G)P(S)
Knowledge Tracing Do all students enter a lesson with the same background knowledge?
K K K
Q Q Q
P(T) P(T)P(L0|S)
P(G)P(S)
S
Knowledge Tracing with Individualized P(L0)
Node representationsK = Knowledge nodeQ = Question nodeS = Student node
Node statesK = Two state (0 or 1)Q = Two state (0 or 1)S = Multi state (1 to N)
P(L0|S)
Observed
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
K K K
Q Q Q
P(T) P(T)
Model ParametersP(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip
Nodes representationK = knowledge nodeQ = question node
Node statesK = two state (0 or 1)Q = two state (0 or 1)
P(L0)
P(G)P(S)
Knowledge Tracing Conditional Probability Table of Student node and Individualized Prior node
K K K
Q Q Q
P(T) P(T)P(L0|S)
P(G)P(S)
S
Knowledge Tracing with Individualized P(L0)P(L0|S)
S value P(S=value)
1 1/N
2 1/N
3 1/N
… …N 1/N
CPT of Student node
• CPT of observed student node is fixed• Possible to have S value for every student ID• Raises initialization issue (where do these prior values come from?)
S value can represent a cluster or type of student
instead of ID
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
Prior Individualization Approach
K K K
Q Q Q
P(T) P(T)
Model ParametersP(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip
Nodes representationK = knowledge nodeQ = question node
Node statesK = two state (0 or 1)Q = two state (0 or 1)
P(L0)
P(G)P(S)
Knowledge Tracing Conditional Probability Table of Student node and Individualized Prior node
K K K
Q Q Q
P(T) P(T)P(L0|S)
P(G)P(S)
S
Knowledge Tracing with Individualized P(L0)P(L0|S)
S value P(L0|S)
1 0.05
2 0.30
3 0.95
… …N 0.92
CPT of Individualized Prior node• Individualized L0 values need to be seeded• This CPT can be fixed or the values can be learned• Fixing this CPT and seeding it with values based on a student’s first response can be an effective strategy
This model, that only individualizes L0, the Prior Per
Student (PPS) model
P(L0|S)
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
Prior Individualization Approach
K K K
Q Q Q
P(T) P(T)
Model ParametersP(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip
Nodes representationK = knowledge nodeQ = question node
Node statesK = two state (0 or 1)Q = two state (0 or 1)
P(L0)
P(G)P(S)
Knowledge Tracing Conditional Probability Table of Student node and Individualized Prior node
K K K
Q Q Q
P(T) P(T)P(L0|S)
P(G)P(S)
S
Knowledge Tracing with Individualized P(L0)P(L0|S)
S value P(L0|S)
0 0.05
1 0.30
CPT of Individualized Prior node• Bootstrapping prior• If a student answers incorrectly on the first question, she gets a low prior•If a student answers correctly on the first question, she gets a higher prior
P(L0|S)
1
1
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
Prior Individualization Approach
K K K
Q Q Q
P(T) P(T)
Model ParametersP(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip
Nodes representationK = knowledge nodeQ = question node
Node statesK = two state (0 or 1)Q = two state (0 or 1)
P(L0)
P(G)P(S)
Knowledge Tracing What values to use for the two priors?
K K K
Q Q Q
P(T) P(T)P(L0|S)
P(G)P(S)
S
Knowledge Tracing with Individualized P(L0)P(L0|S)
S value P(L0|S)
0 0.05
1 0.30
CPT of Individualized Prior node
What values to use for the two priors?
P(L0|S)
1
1
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
Prior Individualization Approach
K K K
Q Q Q
P(T) P(T)
Model ParametersP(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip
Nodes representationK = knowledge nodeQ = question node
Node statesK = two state (0 or 1)Q = two state (0 or 1)
P(L0)
P(G)P(S)
Knowledge Tracing What values to use for the two priors?
K K K
Q Q Q
P(T) P(T)P(L0|S)
P(G)P(S)
S
Knowledge Tracing with Individualized P(L0)P(L0|S)
S value P(L0|S)
0 0.10
1 0.85
CPT of Individualized Prior node1. Use ad-hoc values
P(L0|S)
1
1
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
Prior Individualization Approach
K K K
Q Q Q
P(T) P(T)
Model ParametersP(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip
Nodes representationK = knowledge nodeQ = question node
Node statesK = two state (0 or 1)Q = two state (0 or 1)
P(L0)
P(G)P(S)
Knowledge Tracing What values to use for the two priors?
K K K
Q Q Q
P(T) P(T)P(L0|S)
P(G)P(S)
S
Knowledge Tracing with Individualized P(L0)P(L0|S)
S value P(L0|S)
0 EM
1 EM
CPT of Individualized Prior node1. Use ad-hoc values2. Learn the values
P(L0|S)
1
1
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
Prior Individualization Approach
K K K
Q Q Q
P(T) P(T)
Model ParametersP(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip
Nodes representationK = knowledge nodeQ = question node
Node statesK = two state (0 or 1)Q = two state (0 or 1)
P(L0)
P(G)P(S)
Knowledge Tracing What values to use for the two priors?
K K K
Q Q Q
P(T) P(T)P(L0|S)
P(G)P(S)
S
Knowledge Tracing with Individualized P(L0)P(L0|S)
S value P(L0|S)
0 Slip
1 1-Guess
CPT of Individualized Prior node1. Use ad-hoc values2. Learn the values3. Link with the
guess/slip CPT
P(L0|S)
1
1
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
Prior Individualization Approach
K K K
Q Q Q
P(T) P(T)
Model ParametersP(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip
Nodes representationK = knowledge nodeQ = question node
Node statesK = two state (0 or 1)Q = two state (0 or 1)
P(L0)
P(G)P(S)
Knowledge Tracing What values to use for the two priors?
K K K
Q Q Q
P(T) P(T)P(L0|S)
P(G)P(S)
S
Knowledge Tracing with Individualized P(L0)P(L0|S)
S value P(L0|S)
0 Slip
1 1-Guess
CPT of Individualized Prior node1. Use ad-hoc values2. Learn the values3. Link with the
guess/slip CPT
P(L0|S)
1
1
With ASSISTments, PPS (ad-hoc) achieved an R2 of 0.301 (0.176 with KT)
(Pardos & Heffernan, UMAP 2010)
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
Prior Individualization Approach
UMAP 2011 25
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
Variations on Knowledge Tracing(and other models)
K K K
Q Q Q
P(T) P(T)
Model ParametersP(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip
Nodes representationK = knowledge nodeQ = question node
Node statesK = two state (0 or 1)Q = two state (0 or 1)
P(L0)
P(G)P(S)
Knowledge Tracing P(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip
UMAP 2011
P(L0) P(T) P(T)
(Baker et al., 2010)
26
1. BKT-BFLearns values for these parameters byperforming a grid search (0.01 granularity)and chooses the set of parameters with the best squared error
. . .
P(G) P(G) P(G)P(S) P(S) P(S)
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
K K K
Q Q Q
P(T) P(T)
Model ParametersP(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip
Nodes representationK = knowledge nodeQ = question node
Node statesK = two state (0 or 1)Q = two state (0 or 1)
P(L0)
P(G)P(S)
Knowledge Tracing P(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip
UMAP 2011
P(L0) P(T) P(T)
(Chang et al., 2006)
27
2. BKT-EMLearns values for these parameters with Expectation Maximization (EM). Maximizes the log likelihood fit to the data
. . .
P(G) P(G) P(G)P(S) P(S) P(S)
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
K K K
Q Q Q
P(T) P(T)
Model ParametersP(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip
Nodes representationK = knowledge nodeQ = question node
Node statesK = two state (0 or 1)Q = two state (0 or 1)
P(L0)
P(G)P(S)
Knowledge Tracing P(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip
UMAP 2011
P(L0) P(T) P(T)
(Baker, Corbett, & Aleven, 2008)
28
3. BKT-CGSGuess and slip parameters are assessed contextually using a regression on features generated from student performance in the tutor
. . .
P(G) P(G) P(G)P(S) P(S) P(S)
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
K K K
Q Q Q
P(T) P(T)
Model ParametersP(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip
Nodes representationK = knowledge nodeQ = question node
Node statesK = two state (0 or 1)Q = two state (0 or 1)
P(L0)
P(G)P(S)
Knowledge Tracing P(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip
UMAP 2011
P(L0) P(T) P(T)
(Baker, Corbett, & Aleven, 2008)
29
4. BKT-CSlipUses the student’s averaged contextual Slip parameter learned across all incorrect actions.
. . .
P(G) P(G) P(G)P(S) P(S) P(S)
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
K K K
Q Q Q
P(T) P(T)
Model ParametersP(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip
Nodes representationK = knowledge nodeQ = question node
Node statesK = two state (0 or 1)Q = two state (0 or 1)
P(L0)
P(G)P(S)
Knowledge Tracing P(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip
UMAP 2011
P(L0) P(T) P(T)
(Nooraiei et al, 2011)
30
5. BKT-LessDataLimits students response sequence length to the most recent 15 during EM training.
. . .
P(G) P(G) P(G)P(S) P(S) P(S)Most recent 15 responses used (max)
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
K K K
Q Q Q
P(T) P(T)
Model ParametersP(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip
Nodes representationK = knowledge nodeQ = question node
Node statesK = two state (0 or 1)Q = two state (0 or 1)
P(L0)
P(G)P(S)
Knowledge Tracing P(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip
UMAP 2011
P(L0) P(T) P(T)
(Pardos & Heffernan, 2010)
31
6. BKT-PPSPrior per student (PPS) model which individualizes the prior parameter. Students are assigned a prior based on their response to the first question.
. . .
P(G) P(G) P(G)P(S) P(S) P(S)
K K K
Q Q Q
P(T) P(T)P(L0|S)
P(G)P(S)
S
Knowledge Tracing with Individualized P(L0)P(L0|S)
Observed
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
UMAP 2011 32
7. CFARCorrect on First Attempt Rate (CFAR) calculates the student’s percent correct on the current skill up until the question being predicted.
Student responses for Skill X: 0 1 0 1 0 1 _
Predicted next response would be 0.50
(Yu et al., 2010)
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
UMAP 2011 33
8. TablingUses the student’s response sequence (max length 3) to predict the next response by looking up the average next response among student with the same sequence in the training set
Training setStudent A: 0 1 1 0Student B: 0 1 1 1Student C: 0 1 1 1
Predicted next response would be 0.66
Test set student: 0 0 1 _
Max table length set to 3:Table size was 20+21+22+23=15
(Wang et al., 2011)
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
UMAP 2011 34
9. PFAPerformance Factors Analysis (PFA). Logistic regression model which elaborates on the Rasch IRT model. Predicts performance based on the count of student’s prior failures and successes on the current skill.
An overall difficulty parameter ᵝ is also fit for each skill or each item In this study we use the variant of PFA that fits ᵝ for each skill. The PFA equation is:
(Pavlik et al., 2009)
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
Study
• Cognitive Tutor for Genetics– 76 CMU undergraduate students – 9 Skills (no multi-skill steps)– 23,706 problem solving attempts – 11,582 problem steps in the tutor– 152 average problem steps completed per student
(SD=50)– Pre and post-tests were administered with this
assignment
Dataset
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
MethodologyEvaluation
Intro to Knowledge Tracing
Study• Predictions were made by the 9 models using a 5 fold cross-validation by student
Methodologymodel in-tutor prediction
Student 1 Skill A Resp 1 0.10 0.22 0
Skill A Resp 2 ….
0.51 0.26 1
Skill A Resp N 0.77 0.40 1
Student 1 Skill B Resp 1 …
0.55 0.60 1
Skill B Resp N 0.41 0.61 0
BKT-BF
BKT-EM…
Actual
• Accuracy was calculated with A’ for each student. Those values were then averaged across students to report the model’s A’ (higher is better)
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
StudyResultsin-tutor model prediction
Model A’
BKT-PPS 0.7029
BKT-BF 0.6969
BKT-EM 0.6957
BKT-LessData 0.6839
PFA 0.6629
Tabling 0.6476
BKT-CSlip 0.6149
CFAR 0.5705
BKT-CGS 0.4857
A’ results averaged across students
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
StudyResultsin-tutor model prediction
Model A’
BKT-PPS 0.7029
BKT-BF 0.6969
BKT-EM 0.6957
BKT-LessData 0.6839
PFA 0.6629
Tabling 0.6476
BKT-CSlip 0.6149
CFAR 0.5705
BKT-CGS 0.4857
A’ results averaged across studentsNo significant differences within these BKT
Significant differences between these BKT and PFA
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
Study• 5 ensemble methods were used, trained with the same 5 fold cross-validation folds
Methodologyensemble in-tutor prediction
• Ensemble methods were trained using the 9 model predictions as the features and the actual response as the label.
Student 1 Skill A Resp 1 0.10 0.22 0
Skill A Resp 2 ….
0.51 0.26 1
Skill A Resp N 0.77 0.40 1
Student 1 Skill B Resp 1 …
0.55 0.60 1
Skill B Resp N 0.41 0.61 0
BKT-BF
BKT-EM…
Actual
features label
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
Study• Ensemble methods used:
1. Linear regression with no feature selection (predictions bounded between {0,1})2. Linear regression with feature selection (stepwise regression)3. Linear regression with only BKT-PPS & BKT-EM4. Linear regression with only BKT-PPS, BKT-EM & BKT-CSlip5. Logistic regression
Methodologyensemble in-tutor prediction
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
StudyResultsin-tutor ensemble prediction
Model A’
Ensemble: LinReg with BKT-PPS, BKT-EM & BKT-CSlip 0.7028
Ensemble: LinReg with BKT-PPS & BKT-EM 0.6973
Ensemble: LinReg without feature selection 0.6945
Ensemble: LinReg with feature selection (stepwise) 0.6954
Ensemble: Logistic without feature selection 0.6854
A’ results averaged across students
Tabling
No significant difference between ensembles
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
StudyResultsin-tutor ensemble & model prediction
Model A’BKT-PPS 0.7029Ensemble: LinReg with BKT-PPS, BKT-EM & BKT-CSlip 0.7028Ensemble: LinReg with BKT-PPS & BKT-EM 0.6973BKT-BF 0.6969BKT-EM 0.6957Ensemble: LinReg without feature selection 0.6945Ensemble: LinReg with feature selection (stepwise) 0.6954Ensemble: Logistic without feature selection 0.6854BKT-LessData 0.6839PFA 0.6629Tabling 0.6476BKT-CSlip 0.6149CFAR 0.5705BKT-CGS 0.4857
A’ results averaged across students
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
StudyResultsin-tutor ensemble & model prediction
Model A’
Ensemble: LinReg with BKT-PPS, BKT-EM & BKT-CSlip 0.7451Ensemble: LinReg without feature selection 0.7428Ensemble: LinReg with feature selection (stepwise) 0.7423Ensemble: Logistic regression without feature selection 0.7359Ensemble: LinReg with BKT-PPS & BKT-EM 0.7348BKT-EM 0.7348BKT-BF 0.7330BKT-PPS 0.7310PFA 0.7277BKT-LessData 0.7220CFAR 0.6723Tabling 0.6712Contextual Slip 0.6396BKT-CGS 0.4917
A’ results calculated across all actions
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
In the KDD Cup• Motivation for trying non KT approach:– Bayesian method only uses KC, opportunity count and
student as features. Much information is left unutilized. Another machine learning method is required
• Strategy:– Engineer additional features from the dataset and use
Random Forests to train a model
Random Forests
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
Random Forests
• Strategy:– Create rich feature datasets that include features created
from features not included in the test set
Valid
ation
set 2
(v
al2)
Valid
ation
set 1
(v
al1)
raw training dataset rows raw test dataset row
s
Feature Rich Validation set 2 (frval2)
Feature Rich Validation set 1 (frval1)
Feature Rich Test set (frtest)
Non validation training rows
(nvtrain)
Random Forests
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
• Created by Leo Breiman• The method trains T number of separate decision tree
classifiers (50-800)• Each decision tree selects a random 1/P portion of the
available features (1/3)• The tree is grown until there are at least M
observations in the leaf (1-100)• When classifying unseen data, each tree votes on the
class. The popular vote wins or an average of the votes (for regression)
Random Forests
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
Feature ImportanceFeatures extracted from training set:• Student progress features (avg. importance: 1.67)
– Number of data points [today, since the start of unit]– Number of correct responses out of the last [3, 5, 10]– Zscore sum for step duration, hint requests, incorrects– Skill specific version of all these features
• Percent correct features (avg. importance: 1.60)– % correct of unit, section, problem and step and total for each skill and also for each
student (10 features)
• Student Modeling Approach features (avg. importance: 1.32)– The predicted probability of correct for the test row– The number of data points used in training the parameters– The final EM log likelihood fit of the parameters / data points
Random Forests
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
• Features of the user were more important in Bridge to Algebra than Algebra
• Student progress features / gaming the system (Baker et al., UMUAI 2008) were important in both datasets
Random Forests
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
Rank Feature set RMSE Coverage1 All features 0.2762 87%
2 Percent correct+ 0.2824 96%
3 All features (fill) 0.2847 97%
Rank Feature set RMSE Coverage1 All features 0.2712 92%
2 All features (fill) 0.2791 99%
3 Percent correct+ 0.2800 98%
Algebra
Bridge to Algebra
Random Forests
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
Rank Feature set RMSE Coverage1 All features 0.2762 87%
2 Percent correct+ 0.2824 96%
3 All features (fill) 0.2847 97%
Rank Feature set RMSE Coverage1 All features 0.2712 92%
2 All features (fill) 0.2791 99%
3 Percent correct+ 0.2800 98%
Algebra
Bridge to Algebra
• Best Bridge to Algebra RMSE on the Leaderboard was 0.2777• Random Forest RMSE of 0.2712 here is exceptional
Random Forests
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
Rank Feature set RMSE Coverage1 All features 0.2762 87%
2 Percent correct+ 0.2824 96%
3 All features (fill) 0.2847 97%
Rank Feature set RMSE Coverage1 All features 0.2712 92%
2 All features (fill) 0.2791 99%
3 Percent correct+ 0.2800 98%
Algebra
Bridge to Algebra
• Skill data for a student was not always available for each test row• Because of this many skill related feature sets only had 92% coverage
Random Forests
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
Conclusion from KDD• Combining user features with skill features was very
powerful in both modeling and classification approaches
• Model tracing based predictions performed formidably against pure machine learning techniques
• Random Forests also performed very well on this educational data set compared to other approaches such as Neural Networks and SVMs. This method could significantly boost accuracy in other EDM datasets.
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
Hardware/Software• Software– MATLAB used for all analysis
• Bayes Net Toolbox for Bayesian Networks Models• Statistics Toolbox for Random Forests classifier
– Perl used for pre-processing• Hardware– Two rocks clusters used for skill model training
• 178 CPUs in total. Training of KT models took ~48 hours when utilizing all CPUs.
– Two 32gig RAM systems for Random Forests• RF models took ~16 hours to train with 800 trees
Random Forests
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
Choose the next topic• KT: 1-35• Prediction: 36-67• Evaluation: 47-77• sig tests: 69-77• Regression/sig tests: 80-112
Time left?
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
UMAP 2011 55
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
Individualize Everything?
Fully Individualized ModelModel ParametersP(L0) = Probability of initial knowledgeP(L0|Q1) = Individual Cold start P(L0)P(T) = Probability of learningP(T|S) = Students’ Individual P(T)P(G) = Probability of guessP(G|S) = Students’ Individual P(G)P(S) = Probability of slipP(S|S) Students’ Individual P(S)
Node representationsK = Knowledge nodeQ = Question nodeS = Student nodeQ1= first response nodeT = Learning nodeG = Guessing nodeS = Slipping node
Parameters in bold are learnedfrom data while the others are fixed
K K K
Q Q Q
P(T) P(T)P(L0|Q1)
P(G)P(S)
S
Student-Skill Interaction Model
Node statesK , Q, Q1, T, G, S = Two state (0 or 1)Q = Two state (0 or 1)S = Multi state (1 to N)(Where N is the number of students in the training data)
G S
T
P(T|S)
P(G|S) P(S|S)
Q1
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
(Pardos & Heffernan, JMLR 2011)
Fully Individualized ModelModel ParametersP(L0) = Probability of initial knowledgeP(L0|Q1) = Individual Cold start P(L0)P(T) = Probability of learningP(T|S) = Students’ Individual P(T)P(G) = Probability of guessP(G|S) = Students’ Individual P(G)P(S) = Probability of slipP(S|S) Students’ Individual P(S)
Node representationsK = Knowledge nodeQ = Question nodeS = Student nodeQ1= first response nodeT = Learning nodeG = Guessing nodeS = Slipping node
Parameters in bold are learnedfrom data while the others are fixed
K K K
Q Q Q
P(T) P(T)P(L0|Q1)
P(G)P(S)
S
Student-Skill Interaction Model
Node statesK , Q, Q1, T, G, S = Two state (0 or 1)Q = Two state (0 or 1)S = Multi state (1 to N)(Where N is the number of students in the training data)
G S
T
P(T|S)
P(G|S) P(S|S)
Q1
S identifies the student
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
(Pardos & Heffernan, JMLR 2011)
Fully Individualized ModelModel ParametersP(L0) = Probability of initial knowledgeP(L0|Q1) = Individual Cold start P(L0)P(T) = Probability of learningP(T|S) = Students’ Individual P(T)P(G) = Probability of guessP(G|S) = Students’ Individual P(G)P(S) = Probability of slipP(S|S) Students’ Individual P(S)
Node representationsK = Knowledge nodeQ = Question nodeS = Student nodeQ1= first response nodeT = Learning nodeG = Guessing nodeS = Slipping node
Parameters in bold are learnedfrom data while the others are fixed
K K K
Q Q Q
P(T) P(T)P(L0|Q1)
P(G)P(S)
S
Student-Skill Interaction Model
Node statesK , Q, Q1, T, G, S = Two state (0 or 1)Q = Two state (0 or 1)S = Multi state (1 to N)(Where N is the number of students in the training data)
G S
T
P(T|S)
P(G|S) P(S|S)
Q1
T contains the CPT lookup table of individual student learn rates
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
(Pardos & Heffernan, JMLR 2011)
Fully Individualized ModelModel ParametersP(L0) = Probability of initial knowledgeP(L0|Q1) = Individual Cold start P(L0)P(T) = Probability of learningP(T|S) = Students’ Individual P(T)P(G) = Probability of guessP(G|S) = Students’ Individual P(G)P(S) = Probability of slipP(S|S) Students’ Individual P(S)
Node representationsK = Knowledge nodeQ = Question nodeS = Student nodeQ1= first response nodeT = Learning nodeG = Guessing nodeS = Slipping node
Parameters in bold are learnedfrom data while the others are fixed
K K K
Q Q Q
P(T) P(T)P(L0|Q1)
P(G)P(S)
S
Student-Skill Interaction Model
Node statesK , Q, Q1, T, G, S = Two state (0 or 1)Q = Two state (0 or 1)S = Multi state (1 to N)(Where N is the number of students in the training data)
G S
T
P(T|S)
P(G|S) P(S|S)
Q1
P(T) is trained for each skill which gives a learn rate for:P(T|T=1) [high learner] and P(T|T=0) [low learner]
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
(Pardos & Heffernan, JMLR 2011)
SSI model results
Algebra Bridge to Algebra0.279
0.28
0.281
0.282
0.283
0.284
0.285
0.286
PPSSSIRM
SE
Dataset New RMSE Prev RMSE Improvement
Algebra 0.2813 0.2835 0.0022
Bridge to Algebra 0.2824 0.2860 0.0036
Average of Improvement is the difference between the 1st and 3rd place. It is also the difference between 3rd and 4th place.
The difference between PPS and SSI are significant in each dataset at the P < 0.01 level (t-test of squared errors)
Intro to Knowledge Tracing
Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos
(Pardos & Heffernan, JMLR 2011)
Recommended