Prediction Develop a model which can infer a single aspect of
the data (predicted variable) from some combination of other
aspects of the data (predictor variables) Which students are
off-task? Which students will fail the class?
Slide 3
Classification Develop a model which can infer a categorical
predicted variable from some combination of other aspects of the
data Which students will fail the class? Is the student currently
gaming the system? Which type of gaming the system is
occurring?
Slide 4
We will We will go into detail on classification methods
tomorrow
Slide 5
In order to use prediction methods We need to know what were
trying to predict And we need to have some labels of it in real
data
Slide 6
For example If we want to predict whether a student using
educational software is off-task, or gaming the system, or bored,
or frustrated, or going to fail the class We need to first collect
some data And within that data, we need to be able to identify
which students are off-task (or the construct of interest), and
ideally when
Slide 7
So we need to label some data We need to obtain outside
knowledge to determine what the value is for the construct of
interest
Slide 8
In some cases We can get a gold-standard label For instance, if
we want to know if a student passed a class, we just go ask their
instructor
Slide 9
But for behavioral constructs Theres no one to ask We cant ask
the student (self-presentation) Theres no gold-standard metric So
we use data labeling methods or observation methods (e.g.
quantitative field observations, video coding) To collect
bronze-standard labels Not perfect, but good enough
Slide 10
One such labeling method Text replay coding
Slide 11
Text replays Pretty-prints of student interaction behavior from
the logs
Slide 12
Examples
Slide 13
Slide 14
Slide 15
Slide 16
Slide 17
Slide 18
Sampling You can set up any sampling schema you want, if you
have enough log data 5 action sequences 20 second sequences Every
behavior on a specific skill, but other skills omitted
Slide 19
Sampling Equal number of observations per lesson Equal number
of observations per student Observations that machine learning
software needs help to categorize (biased sampling)
Slide 20
Major Advantages Both video and field observations hold some
risk of observer effects Text replays are based on logs that were
collected completely unobtrusively
Slide 21
Major Advantages Blazing fast to conduct 8 to 40 seconds per
observation
Slide 22
Notes Decent inter-rater reliability is possible (Baker,
Corbett, & Wagner, 2006) (Baker, Mitrovic, & Mathews, 2010)
(Sao Pedro et al, 2010) (Montalvo et al, 2010) Agree with other
measures of constructs (Baker, Corbett, & Wagner, 2006) Can be
used to train machine-learned detectors (Baker & de Carvalho,
2008) (Baker, Mitrovic, & Mathews, 2010) (Sao Pedro et al,
2010)
Slide 23
Major Limitations Limited range of constructs you can code
Gaming the System yes Collaboration in online chat yes (Prata et
al, 2008) Frustration, Boredom sometimes Off-Task Behavior outside
of software no Collaborative Behavior outside of software no
Slide 24
Major Limitations Lower precision (because lower bandwidth of
observation)
Slide 25
Hands-on exercise
Slide 26
Find a partner Could be your project team-mate, but doesnt have
to be You will do this exercise with them
Slide 27
Get a copy of the text replay software On your flash drive Or
at http://www.joazeirodebaker.net/algebra- obspackage-LSRM.zip
http://www.joazeirodebaker.net/algebra- obspackage-LSRM.zip
Slide 28
Skim the instructions At Instructions-LSRM.docx
Slide 29
Log into text replay software Using exploratory login Try to
figure out what the students behavior means, with your partner Do
this for ~5 minutes
Slide 30
Now pick a category you want to code With your partner
Slide 31
Now code data According to your coding scheme (is-category
versus is-not-category) Separate from your partner For 20
minutes
Slide 32
Now put your data together Using the observations-NAME files
you obtained Make a table (in excel?) showing
Slide 33
Coder 1 Y Coder 1 N Coder 2 Y152 Coder 2 N38
Slide 34
Now We can compute your inter-rater reliability (also called
agreement)
Slide 35
Agreement/ Accuracy The easiest measure of inter-rater
reliability is agreement, also called accuracy # of agreements
total number of codes
Slide 36
Agreement/ Accuracy There is general agreement across fields
that agreement/accuracy is not a good metric What are some
drawbacks of agreement/accuracy?
Slide 37
Agreement/ Accuracy Lets say that Tasha and Uniqua agreed on
the classification of 9200 time sequences, out of 10000 actions For
a coding scheme with two codes 92% accuracy Good, right?
Slide 38
Non-even assignment to categories Percent Agreement does poorly
when there is non-even assignment to categories Which is almost
always the case Imagine an extreme case Uniqua (correctly) picks
category A 92% of the time Tasha always picks category A
Agreement/accuracy of 92% But essentially no information
Slide 39
An alternate metric Kappa (Agreement Expected Agreement) (1
Expected Agreement)
Slide 40
Kappa Expected agreement computed from a table of the form
Rater 2 Category 1 Rater 2 Category 2 Rater 1 Category 1 Count
Rater 1 Category 2 Count
Slide 41
Kappa Expected agreement computed from a table of the form Note
that Kappa can be calculated for any number of categories (but only
2 raters) Rater 2 Category 1 Rater 2 Category 2 Rater 1 Category 1
Count Rater 1 Category 2 Count
Slide 42
Cohens (1960) Kappa The formula for 2 categories Fleisss (1971)
Kappa, which is more complex, can be used for 3+ categories I have
an Excel spreadsheet which calculates multi-category Kappa, which I
would be happy to share with you
Slide 43
Expected agreement Look at the proportion of labels each coder
gave to each category To find the number of agreed category A that
could be expected by chance, multiply
pct(coder1/categoryA)*pct(coder2/categoryA) Do the same thing for
categoryB Add these two values together and divide by the total
number of labels This is your expected agreement
Slide 44
Example Pablo Off-Task Pablo On-Task Tyrone Off-Task 205 Tyrone
On-Task 1560
Slide 45
Example What is the percent agreement? Pablo Off-Task Pablo
On-Task Tyrone Off-Task 205 Tyrone On-Task 1560
Slide 46
Example What is the percent agreement? 80% Pablo Off-Task Pablo
On-Task Tyrone Off-Task 205 Tyrone On-Task 1560
Slide 47
Example What is Tyrones expected frequency for on-task? Pablo
Off-Task Pablo On-Task Tyrone Off-Task 205 Tyrone On-Task 1560
Slide 48
Example What is Tyrones expected frequency for on-task? 75%
Pablo Off-Task Pablo On-Task Tyrone Off-Task 205 Tyrone On-Task
1560
Slide 49
Example What is Pablos expected frequency for on-task? Pablo
Off-Task Pablo On-Task Tyrone Off-Task 205 Tyrone On-Task 1560
Slide 50
Example What is Pablos expected frequency for on-task? 65%
Pablo Off-Task Pablo On-Task Tyrone Off-Task 205 Tyrone On-Task
1560
Slide 51
Example What is the expected on-task agreement? Pablo Off-Task
Pablo On-Task Tyrone Off-Task 205 Tyrone On-Task 1560
Slide 52
Example What is the expected on-task agreement? 0.65*0.75=
0.4875 Pablo Off-Task Pablo On-Task Tyrone Off-Task 205 Tyrone
On-Task 1560
Slide 53
Example What is the expected on-task agreement? 0.65*0.75=
0.4875 Pablo Off-Task Pablo On-Task Tyrone Off-Task 205 Tyrone
On-Task 1560 (48.75)
Slide 54
Example What are Tyrone and Pablos expected frequencies for
off-task behavior? Pablo Off-Task Pablo On-Task Tyrone Off-Task 205
Tyrone On-Task 1560 (48.75)
Slide 55
Example What are Tyrone and Pablos expected frequencies for
off-task behavior? 25% and 35% Pablo Off-Task Pablo On-Task Tyrone
Off-Task 205 Tyrone On-Task 1560 (48.75)
Slide 56
Example What is the expected off-task agreement? Pablo Off-Task
Pablo On-Task Tyrone Off-Task 205 Tyrone On-Task 1560 (48.75)
Slide 57
Example What is the expected off-task agreement? 0.25*0.35=
0.0875 Pablo Off-Task Pablo On-Task Tyrone Off-Task 205 Tyrone
On-Task 1560 (48.75)
Slide 58
Example What is the expected off-task agreement? 0.25*0.35=
0.0875 Pablo Off-Task Pablo On-Task Tyrone Off-Task 20 (8.75)5
Tyrone On-Task 1560 (48.75)
Slide 59
Example What is the total expected agreement? Pablo Off-Task
Pablo On-Task Tyrone Off-Task 20 (8.75)5 Tyrone On-Task 1560
(48.75)
Slide 60
Example What is the total expected agreement? 0.4875+0.0875 =
0.575 Pablo Off-Task Pablo On-Task Tyrone Off-Task 20 (8.75)5
Tyrone On-Task 1560 (48.75)
Slide 61
Example What is kappa? Pablo Off-Task Pablo On-Task Tyrone
Off-Task 20 (8.75)5 Tyrone On-Task 1560 (48.75)
Slide 62
Example What is kappa? (0.8 0.575) / (1-0.575) 0.225/0.425
0.529 Pablo Off-Task Pablo On-Task Tyrone Off-Task 20 (8.75)5
Tyrone On-Task 1560 (48.75)
Slide 63
So is that any good? What is kappa? (0.8 0.575) / (1-0.575)
0.225/0.425 0.529 Pablo Off-Task Pablo On-Task Tyrone Off-Task 20
(8.75)5 Tyrone On-Task 1560 (48.75)
Slide 64
What is your Kappa?
Slide 65
Interpreting Kappa Kappa = 0 Agreement is at chance Kappa = 1
Agreement is perfect Kappa = negative infinity Agreement is
perfectly inverse Kappa > 1 You messed up somewhere