Approximating Learning Curves for Active-Learning-Driven Annotation

Katrin Tomanek and Udo Hahn

Jena University Language and Information Engineering (JULIE) Lab

Agenda

● Introduction to Active Learning

● Stopping Conditions

● Experiments & Results

Passive versus Active Selection

corpus selectionannotation

passive annotation scenario (aka Random Sampling)

labeledunlabeled

Passive versus Active Selection

annotation

„intelligent“example selection

corpus selectionannotation

passive annotation scenario (aka Random Sampling)

active annotation scenario (aka Active Learning)

labeledunlabeled

labeled

unlabeled

Committee-based AL Framework

labeledexamples C

committee:(1) train on subsets(bagging)

(2) predictlabels

(4)select by dis-agreement

(3)calculate dis-agreement

2 ... P

UAL pool:unlabeledexamples

(5) annotate

Reduction of Annotation Effort

learning curves

Reduction of Annotation Effort

60K tokens

> 50% reduction

130K tokens

F=0.83

learning curves

When to Stop the Annotation ?

learning curves

Could gain a lot by further annotation

bad bad

Further annotation will not increase classifier performance significantly

bad badbetter

Further annotation will not increase classifier performance significantly

Stopping Condition basedon Learning Curve ?

● Pro: stopping condition directly based on classifier performance

● Contra: requires labeled gold standard

➔ not applicable in practice as gold standard not available

● Goal:

– Estimate the (progression of) learning curve without need for gold standard

Approximating the Learning Curve

● Approach:

– Based on agreement among committee members

– Does not require extra labeling effort

– Agreement curve approximates progression of learning curve

➔ We can tell relative position in annotation process from it:

● relative trade-off between annotation effort and gain in classifier performance from it

– Steep slope ?– Convergence ?

● Intuition:

– Agreement among committee:

● Low in early AL iterations● High in later ones➔ When agreement among committee members

converges, also learning curve does

● Where to calculate the agreement:

– On separate validation set● Not be involved in AL selection process itself● Agreement values comparable over different AL iteration

– Otherwise agreement curve often not reliable approximation due to „simulation dilemma“

● When e.g. agreement calculated on examples selected in each AL iteration:

– Approximation of learning curve usually works well in simulation scenarios, because...

» few hard cases left in later AL iterations (perfect agreement)

– But fails in real-world annotation scenarios, because...

» in practice AL will always find tricky cases...

Experiments

● For annotation of Named Entity mentions

● Whole sentences selected (20 each round)

● Simulation on CoNLL-2003 corpus

– News-paper, MUC entities (PERS, LOC, ORG)

– AL pool: ~ 14,000 sentences

– Gold Standard: ~ 3,500 sentences

● For learning curve● For agreement curve (labels ignored)

Results

learning curves agreement curve

Summary & Conclusions

● AL has high potential to reduce annotation effort

● Proper stopping point necessary to profit from savings➔ Method to monitor progress of annotation needed

● Agreement curve

– Works well: good approximation of learning curve

– No extra annotation effort: does not require labeled gold standard

Approximating Learning Curves for Active-Learning-Driven Annotation

Thanks. Questions ?

http://www.julielab.de/

Approximating Learning Curves for Active-Learning-Driven Annotation · 2008. 8. 13. ·...

Documents

Active Learning to Assist Annotation of Aerial Images in ... · Index Terms—Active learning, object detection, aerial images, data annotation, human-in-the-loop I. INTRODUCTION

Keyword extraction for metadata annotation of Learning Objects

VIDEO ANNOTATION EFFECTS UPON LEARNING AND …ufdcimages.uflib.ufl.edu/UF/E0/05/01/76/00001/THOMAS_A.pdfvideo annotation effects upon learning and metacognitive monitoring by aaron

Approximating Limit Values from a Graph · Approximating Limit Values from a Graph Plan Learning Goals Students will be able to: Recognize limit statements that correspond to vertical

Assisted Annotation of Surgical Videos Using Deep Learning

Video Annotation and Tracking with Active Learning

Molecular biology Annotationpublic.callutheran.edu/~revie/molbiol/AnnotationOther... · 2008-04-24 · Molecular biology Annotation We are continuing the learning process of annotation

Video Annotation and Tracking with Active Learningpapers.nips.cc/paper/4233-video-annotation-and-tracking-with-active... · Video Annotation and Tracking with Active Learning Carl

Basking Shark Annotation of Poem Learning Intention: I can show my analysis of a poem through annotation of the text

approximating curves

Annotation Flowchart: Named Entity Recognition...Annotation Flowchart: Named Entity Recognition Prodigy is a modern annotation tool for machine learning and NLP, powered by active

Deep Learning vs. Manual Annotation of Eye Movements · 2020. 8. 16. · Deep Learning vs. Manual Annotation of Eye Movements Mikhail Startsev1, Ioannis Agtzidis2, Michael Dorr3 Technical

Automatic Annotation of Web Images combined with Learning of

Ontology-Based Automatic Annotation of Learning Content

Sparse Kernel Learning for Image Annotation

Improving Accuracy of Igbo Corpus Annotation Using ... · Improving Accuracy of Igbo Corpus Annotation Using Morphological Reconstruction and Transformation-Based Learning Improving

Classroom Salon Enhancing Learning through Annotation Visualization

Approximating Functions

Approximating the extinction threshold of spatial dynamics ...xiangshengw/talks/slide201107-Alberta.pdf · Approximating the extinction threshold of ... Approximating the extinction

Decision Tree Learning - Hacettepeilyas/Courses/BIL712/lec02-DecisionTree.pdf · Decision Tree Learning • Decision tree learning is a method for approximating discrete-valued target