Data Annotation for Classification. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination

Embed Size (px)

Citation preview

  • Slide 1
  • Data Annotation for Classification
  • Slide 2
  • Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other aspects of the data (predictor variables) Which students are off-task? Which students will fail the class?
  • Slide 3
  • Classification Develop a model which can infer a categorical predicted variable from some combination of other aspects of the data Which students will fail the class? Is the student currently gaming the system? Which type of gaming the system is occurring?
  • Slide 4
  • We will We will go into detail on classification methods tomorrow
  • Slide 5
  • In order to use prediction methods We need to know what were trying to predict And we need to have some labels of it in real data
  • Slide 6
  • For example If we want to predict whether a student using educational software is off-task, or gaming the system, or bored, or frustrated, or going to fail the class We need to first collect some data And within that data, we need to be able to identify which students are off-task (or the construct of interest), and ideally when
  • Slide 7
  • So we need to label some data We need to obtain outside knowledge to determine what the value is for the construct of interest
  • Slide 8
  • In some cases We can get a gold-standard label For instance, if we want to know if a student passed a class, we just go ask their instructor
  • Slide 9
  • But for behavioral constructs Theres no one to ask We cant ask the student (self-presentation) Theres no gold-standard metric So we use data labeling methods or observation methods (e.g. quantitative field observations, video coding) To collect bronze-standard labels Not perfect, but good enough
  • Slide 10
  • One such labeling method Text replay coding
  • Slide 11
  • Text replays Pretty-prints of student interaction behavior from the logs
  • Slide 12
  • Examples
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Sampling You can set up any sampling schema you want, if you have enough log data 5 action sequences 20 second sequences Every behavior on a specific skill, but other skills omitted
  • Slide 19
  • Sampling Equal number of observations per lesson Equal number of observations per student Observations that machine learning software needs help to categorize (biased sampling)
  • Slide 20
  • Major Advantages Both video and field observations hold some risk of observer effects Text replays are based on logs that were collected completely unobtrusively
  • Slide 21
  • Major Advantages Blazing fast to conduct 8 to 40 seconds per observation
  • Slide 22
  • Notes Decent inter-rater reliability is possible (Baker, Corbett, & Wagner, 2006) (Baker, Mitrovic, & Mathews, 2010) (Sao Pedro et al, 2010) (Montalvo et al, 2010) Agree with other measures of constructs (Baker, Corbett, & Wagner, 2006) Can be used to train machine-learned detectors (Baker & de Carvalho, 2008) (Baker, Mitrovic, & Mathews, 2010) (Sao Pedro et al, 2010)
  • Slide 23
  • Major Limitations Limited range of constructs you can code Gaming the System yes Collaboration in online chat yes (Prata et al, 2008) Frustration, Boredom sometimes Off-Task Behavior outside of software no Collaborative Behavior outside of software no
  • Slide 24
  • Major Limitations Lower precision (because lower bandwidth of observation)
  • Slide 25
  • Hands-on exercise
  • Slide 26
  • Find a partner Could be your project team-mate, but doesnt have to be You will do this exercise with them
  • Slide 27
  • Get a copy of the text replay software On your flash drive Or at http://www.joazeirodebaker.net/algebra- obspackage-LSRM.zip http://www.joazeirodebaker.net/algebra- obspackage-LSRM.zip
  • Slide 28
  • Skim the instructions At Instructions-LSRM.docx
  • Slide 29
  • Log into text replay software Using exploratory login Try to figure out what the students behavior means, with your partner Do this for ~5 minutes
  • Slide 30
  • Now pick a category you want to code With your partner
  • Slide 31
  • Now code data According to your coding scheme (is-category versus is-not-category) Separate from your partner For 20 minutes
  • Slide 32
  • Now put your data together Using the observations-NAME files you obtained Make a table (in excel?) showing
  • Slide 33
  • Coder 1 Y Coder 1 N Coder 2 Y152 Coder 2 N38
  • Slide 34
  • Now We can compute your inter-rater reliability (also called agreement)
  • Slide 35
  • Agreement/ Accuracy The easiest measure of inter-rater reliability is agreement, also called accuracy # of agreements total number of codes
  • Slide 36
  • Agreement/ Accuracy There is general agreement across fields that agreement/accuracy is not a good metric What are some drawbacks of agreement/accuracy?
  • Slide 37
  • Agreement/ Accuracy Lets say that Tasha and Uniqua agreed on the classification of 9200 time sequences, out of 10000 actions For a coding scheme with two codes 92% accuracy Good, right?
  • Slide 38
  • Non-even assignment to categories Percent Agreement does poorly when there is non-even assignment to categories Which is almost always the case Imagine an extreme case Uniqua (correctly) picks category A 92% of the time Tasha always picks category A Agreement/accuracy of 92% But essentially no information
  • Slide 39
  • An alternate metric Kappa (Agreement Expected Agreement) (1 Expected Agreement)
  • Slide 40
  • Kappa Expected agreement computed from a table of the form Rater 2 Category 1 Rater 2 Category 2 Rater 1 Category 1 Count Rater 1 Category 2 Count
  • Slide 41
  • Kappa Expected agreement computed from a table of the form Note that Kappa can be calculated for any number of categories (but only 2 raters) Rater 2 Category 1 Rater 2 Category 2 Rater 1 Category 1 Count Rater 1 Category 2 Count
  • Slide 42
  • Cohens (1960) Kappa The formula for 2 categories Fleisss (1971) Kappa, which is more complex, can be used for 3+ categories I have an Excel spreadsheet which calculates multi-category Kappa, which I would be happy to share with you
  • Slide 43
  • Expected agreement Look at the proportion of labels each coder gave to each category To find the number of agreed category A that could be expected by chance, multiply pct(coder1/categoryA)*pct(coder2/categoryA) Do the same thing for categoryB Add these two values together and divide by the total number of labels This is your expected agreement
  • Slide 44
  • Example Pablo Off-Task Pablo On-Task Tyrone Off-Task 205 Tyrone On-Task 1560
  • Slide 45
  • Example What is the percent agreement? Pablo Off-Task Pablo On-Task Tyrone Off-Task 205 Tyrone On-Task 1560
  • Slide 46
  • Example What is the percent agreement? 80% Pablo Off-Task Pablo On-Task Tyrone Off-Task 205 Tyrone On-Task 1560
  • Slide 47
  • Example What is Tyrones expected frequency for on-task? Pablo Off-Task Pablo On-Task Tyrone Off-Task 205 Tyrone On-Task 1560
  • Slide 48
  • Example What is Tyrones expected frequency for on-task? 75% Pablo Off-Task Pablo On-Task Tyrone Off-Task 205 Tyrone On-Task 1560
  • Slide 49
  • Example What is Pablos expected frequency for on-task? Pablo Off-Task Pablo On-Task Tyrone Off-Task 205 Tyrone On-Task 1560
  • Slide 50
  • Example What is Pablos expected frequency for on-task? 65% Pablo Off-Task Pablo On-Task Tyrone Off-Task 205 Tyrone On-Task 1560
  • Slide 51
  • Example What is the expected on-task agreement? Pablo Off-Task Pablo On-Task Tyrone Off-Task 205 Tyrone On-Task 1560
  • Slide 52
  • Example What is the expected on-task agreement? 0.65*0.75= 0.4875 Pablo Off-Task Pablo On-Task Tyrone Off-Task 205 Tyrone On-Task 1560
  • Slide 53
  • Example What is the expected on-task agreement? 0.65*0.75= 0.4875 Pablo Off-Task Pablo On-Task Tyrone Off-Task 205 Tyrone On-Task 1560 (48.75)
  • Slide 54
  • Example What are Tyrone and Pablos expected frequencies for off-task behavior? Pablo Off-Task Pablo On-Task Tyrone Off-Task 205 Tyrone On-Task 1560 (48.75)
  • Slide 55
  • Example What are Tyrone and Pablos expected frequencies for off-task behavior? 25% and 35% Pablo Off-Task Pablo On-Task Tyrone Off-Task 205 Tyrone On-Task 1560 (48.75)
  • Slide 56
  • Example What is the expected off-task agreement? Pablo Off-Task Pablo On-Task Tyrone Off-Task 205 Tyrone On-Task 1560 (48.75)
  • Slide 57
  • Example What is the expected off-task agreement? 0.25*0.35= 0.0875 Pablo Off-Task Pablo On-Task Tyrone Off-Task 205 Tyrone On-Task 1560 (48.75)
  • Slide 58
  • Example What is the expected off-task agreement? 0.25*0.35= 0.0875 Pablo Off-Task Pablo On-Task Tyrone Off-Task 20 (8.75)5 Tyrone On-Task 1560 (48.75)
  • Slide 59
  • Example What is the total expected agreement? Pablo Off-Task Pablo On-Task Tyrone Off-Task 20 (8.75)5 Tyrone On-Task 1560 (48.75)
  • Slide 60
  • Example What is the total expected agreement? 0.4875+0.0875 = 0.575 Pablo Off-Task Pablo On-Task Tyrone Off-Task 20 (8.75)5 Tyrone On-Task 1560 (48.75)
  • Slide 61
  • Example What is kappa? Pablo Off-Task Pablo On-Task Tyrone Off-Task 20 (8.75)5 Tyrone On-Task 1560 (48.75)
  • Slide 62
  • Example What is kappa? (0.8 0.575) / (1-0.575) 0.225/0.425 0.529 Pablo Off-Task Pablo On-Task Tyrone Off-Task 20 (8.75)5 Tyrone On-Task 1560 (48.75)
  • Slide 63
  • So is that any good? What is kappa? (0.8 0.575) / (1-0.575) 0.225/0.425 0.529 Pablo Off-Task Pablo On-Task Tyrone Off-Task 20 (8.75)5 Tyrone On-Task 1560 (48.75)
  • Slide 64
  • What is your Kappa?
  • Slide 65
  • Interpreting Kappa Kappa = 0 Agreement is at chance Kappa = 1 Agreement is perfect Kappa = negative infinity Agreement is perfectly inverse Kappa > 1 You messed up somewhere
  • Slide 66
  • Kappa