Upload
basil-york
View
229
Download
4
Tags:
Embed Size (px)
Citation preview
Choosing Sample Size for Knowledge Tracing ModelsDERRICK COETZEE
Motivation◦ BKT parameters are inferred from data◦ But best solution for a given data set may not quite match
the parameters that actually generated it (sampling error)0,0,0,0,00,0,0,0,00,1,1,0,10,1,0,0,00,0,1,1,0
5 students,5 problems each,25 bits of data
prior = 0.205learning = 0.010guess = 0.142slip = 0.031
4 parameters,3 decimal digits each,39.9 bits of data
Not even possible for all parameter sets to be represented!
Questions◦ So how much data is needed for accurate estimates?
◦ And do the parameter values affect how much you need?◦ Can we give confidence intervals for parameters?
Normal distribution over samples◦ Mean is almost always near true
generating value◦ Standard deviation can be used to
describe variation of estimates◦ Can use 68–95–99.7 rule for
confidence intervals
Variation does depend on parameter values◦ Each parameter behaves
differently◦ Best estimates for parameters
near zero/one, worst in 05-0.8 range
There are interactions between parameter values◦ Can’t just precompute a table of
stddevs for each parameter ◦ Complex relationship, analytical
approach probably infeasible◦ But at least there is continuity
with small rates of change
Sample size recommendations◦ Stddev proportional to 1/sqrt(n)
◦ Must increase sample size by factor of 4 to improve error by factor of 2
◦ Small data sets (<1000 students) will not give even one sigfig in all parameters◦ Question systems based on small
classes!
No interaction between sample size and parameters◦ Change sample size without changing
parameters →predictable variation in error
◦ Gives an approach to estimate error on real-world data sets:◦ Take samples with replacement, infer
parameters for each, compute stddev◦ Scale using 1/sqrt(n) to estimate stddevs at
other sample sizes
Knowledge Tracing for Interacting Student PairsDERRICK COETZEE
Motivation◦ Standard Bayesian knowledge tracing uses fixed
learning rate parameter to capture all learning
Motivation◦ One way to improve: use information on course
materials viewed
Motivation◦ What about peer interaction (e.g. forums/chat)?◦ Not fixed/static like instructional materials
◦ The level of knowledge of the other student is important◦ Use our BKT model of the other student’s knowledge!
Pair interaction scenario◦ Simple case of student interaction◦ Two students are paired and always interact between
each item (no interactions with others)
Do exercise Learnindependently
Interact with partner
Do exercise Learnindependently
Pair interaction scenario◦ Model independent learning and interaction stages
Pair interaction scenario◦ Model independent learning and interaction stages◦ New parameters: teach, mislead
Knows Other student knows
Probabilityknows after interaction
No No 0
Yes Yes 1
No Yes teach
Yes No 1−mislead
Results: Preliminary simulations◦ 5-parameter system (prior, learn, guess, slip, teach)
◦ forget, mislead parameters fixed at zero◦ Generate synthetic data, run EM from generating values◦ Same behavior as classic system when teach = 0◦ Unstable when teach > 0
◦ Converges to trivial solution prior=learn=teach=1, slip=proportion incorrect responses
◦ Occurs for both small and large teach parameters
Results: Preliminary simulations◦ 4-parameter system (learn, guess, slip, teach)
◦ forget, mislead, prior fixed at zero◦ For small teach values (e.g. 0.05), teach converges to zero◦ Yields nontrivial solutions for large teach values, but other
parameters absorb some of the teach:◦ learn=0.0900, guess=0.1400, slip=0.0900, teach=0.9000, 100 students →
learn=0.1586, guess=0.1648, slip=0.0856, teach=0.6481◦ learn=0.0900, guess=0.1400, slip=0.0900, teach=0.9000, 1000 students →
learn=0.1643, guess=0.1940, slip=0.1102, teach=0.7225
Results: Preliminary simulations◦ 4-parameter system (learn, guess, slip, teach) with 10000
students and high teach◦ prior=0.0000, learn=0.0900, guess=0.1400, slip=0.0900, teach=0.9000 →
prior=0.2184, learn=0.0841, guess=0.1239, slip=0.2658, teach=0.8793◦ prior and slip have high error, but learning/guess/teach are good◦ teach accuracy increases dramatically with sample size
Possible solutions◦ Answer items between
independent learning and interaction (more observed data)
◦ Mentor/mentee model: knowledge flows in only one direction
◦ Eliminate different parameters, or combine parameters to create lower-dimensional space
Future work◦ Determine whether interaction model produces better
predictions on synthetic data◦ Gather real-world pair interaction data using MOOCchat
tool◦ Determine whether pair interaction produces better predictions◦ Typical values, appropriate interpretations for teach and mislead
parameters?◦ Generalize to more complex interactions