Choosing Sample Size for Knowledge Tracing Models DERRICK COETZEE

Choosing Sample Size for Knowledge Tracing ModelsDERRICK COETZEE

Motivation◦ BKT parameters are inferred from data◦ But best solution for a given data set may not quite match

the parameters that actually generated it (sampling error)0,0,0,0,00,0,0,0,00,1,1,0,10,1,0,0,00,0,1,1,0

5 students,5 problems each,25 bits of data

prior = 0.205learning = 0.010guess = 0.142slip = 0.031

4 parameters,3 decimal digits each,39.9 bits of data

Not even possible for all parameter sets to be represented!

Questions◦ So how much data is needed for accurate estimates?

◦ And do the parameter values affect how much you need?◦ Can we give confidence intervals for parameters?

Normal distribution over samples◦ Mean is almost always near true

generating value◦ Standard deviation can be used to

describe variation of estimates◦ Can use 68–95–99.7 rule for

confidence intervals

Variation does depend on parameter values◦ Each parameter behaves

differently◦ Best estimates for parameters

near zero/one, worst in 05-0.8 range

There are interactions between parameter values◦ Can’t just precompute a table of

stddevs for each parameter ◦ Complex relationship, analytical

approach probably infeasible◦ But at least there is continuity

with small rates of change

Sample size recommendations◦ Stddev proportional to 1/sqrt(n)

◦ Must increase sample size by factor of 4 to improve error by factor of 2

◦ Small data sets (<1000 students) will not give even one sigfig in all parameters◦ Question systems based on small

classes!

No interaction between sample size and parameters◦ Change sample size without changing

parameters →predictable variation in error

◦ Gives an approach to estimate error on real-world data sets:◦ Take samples with replacement, infer

parameters for each, compute stddev◦ Scale using 1/sqrt(n) to estimate stddevs at

other sample sizes

Knowledge Tracing for Interacting Student PairsDERRICK COETZEE

Motivation◦ Standard Bayesian knowledge tracing uses fixed

learning rate parameter to capture all learning

Motivation◦ One way to improve: use information on course

materials viewed

Motivation◦ What about peer interaction (e.g. forums/chat)?◦ Not fixed/static like instructional materials

◦ The level of knowledge of the other student is important◦ Use our BKT model of the other student’s knowledge!

Pair interaction scenario◦ Simple case of student interaction◦ Two students are paired and always interact between

each item (no interactions with others)

Do exercise Learnindependently

Interact with partner

Do exercise Learnindependently

Pair interaction scenario◦ Model independent learning and interaction stages

Pair interaction scenario◦ Model independent learning and interaction stages◦ New parameters: teach, mislead

Knows Other student knows

Probabilityknows after interaction

No No 0

Yes Yes 1

No Yes teach

Yes No 1−mislead

Results: Preliminary simulations◦ 5-parameter system (prior, learn, guess, slip, teach)

◦ forget, mislead parameters fixed at zero◦ Generate synthetic data, run EM from generating values◦ Same behavior as classic system when teach = 0◦ Unstable when teach > 0

◦ Converges to trivial solution prior=learn=teach=1, slip=proportion incorrect responses

◦ Occurs for both small and large teach parameters

Results: Preliminary simulations◦ 4-parameter system (learn, guess, slip, teach)

◦ forget, mislead, prior fixed at zero◦ For small teach values (e.g. 0.05), teach converges to zero◦ Yields nontrivial solutions for large teach values, but other

parameters absorb some of the teach:◦ learn=0.0900, guess=0.1400, slip=0.0900, teach=0.9000, 100 students →

learn=0.1586, guess=0.1648, slip=0.0856, teach=0.6481◦ learn=0.0900, guess=0.1400, slip=0.0900, teach=0.9000, 1000 students →

learn=0.1643, guess=0.1940, slip=0.1102, teach=0.7225

Results: Preliminary simulations◦ 4-parameter system (learn, guess, slip, teach) with 10000

students and high teach◦ prior=0.0000, learn=0.0900, guess=0.1400, slip=0.0900, teach=0.9000 →

prior=0.2184, learn=0.0841, guess=0.1239, slip=0.2658, teach=0.8793◦ prior and slip have high error, but learning/guess/teach are good◦ teach accuracy increases dramatically with sample size

Possible solutions◦ Answer items between

independent learning and interaction (more observed data)

◦ Mentor/mentee model: knowledge flows in only one direction

◦ Eliminate different parameters, or combine parameters to create lower-dimensional space

Future work◦ Determine whether interaction model produces better

predictions on synthetic data◦ Gather real-world pair interaction data using MOOCchat

tool◦ Determine whether pair interaction produces better predictions◦ Typical values, appropriate interpretations for teach and mislead

parameters?◦ Generalize to more complex interactions

Documents

Choosing Sample Size for Knowledge Tracing Models DERRICK COETZEE