Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Proactive Learning withMultiple Class-Sensitive Labelers
Seungwhan (Shane) Moon, Jaime Carbonell
School of Computer Science, Carnegie Mellon University
DSAA 2014 Conference 10/30/2014
Proactive Learning withMultiple Class-Sensitive Labelers
Seungwhan (Shane) Moon, Jaime Carbonell
School of Computer Science, Carnegie Mellon University
DSAA 2014 Conference 10/30/2014
Unlabeled Data is Abundant
3
Unlabeled Data is Abundant
• Imagine building a Vehicle classifier
4
Scarcity of labeled data
Active Learning
5
Active Learning
6
Query Strategies
• Uncertainty Sampling
• Query by Committee
• Entropy Based Sampling
• Density Weighted Methods
• and more …
7
Uncertainty Sampling
Label 1
Label 2Unlabeled
Current Decision Boundary
8
Uncertainty Sampling
Label 1
Label 2Unlabeled
Current Decision Boundary
= most uncertain
9
Assumptions in Traditional Active Learning
• Annotator(s) always give perfect answers (oracle)
• There is no difference in cost for querying different annotators
10
Proactive Learning [Carbonell et. al]
• Relaxes the following assumptions:
• Only a single annotator gives labels
• Annotators always give perfect answers
• Annotators are insensitive to costs
—> utility optimization under budget constraint
11
Proactive Learning [Carbonell et. al]
12
Multiple annotators
They have different labeling accuracy (expertise) incur different cost
Proactive Learning [Carbonell et. al]
13
Key Component: Estimating Labeler Accuracy
Probability of getting a right answer for an unlabeled instance x, and an expert k
Limitation in previous literature on proactive learning
Labeler accuracy is independent of label in multi-class problems
Proactive Learning with Multiple Domain Experts: Anology
Motivation
14
Diagnosis of a patient with unknown disease (uncertainty in data)
Proactive Learning with Multiple Domain Experts: Anology
Motivation
15
Diagnosis of a patient with unknown disease (uncertainty in data)Given multiple physicians with different specialization (multiple class-sensitive experts)
If we know the patient has seemingly cancer symptoms (posterior class probability)
And that oncologist treats cancer issues (estimated labeler accuracy given a specific class)
Better delegate a task to its respective expert
Proactive Learning with Multiple Domain Experts
Problem Formulation (Objective)
Greedy Approximation
:::
16
Proactive Learning with Multiple Domain Experts
Utility Criteria for Greedy Approximation
17
Jointly optimize for an instance and expert pair which
- has high information value V(X) (instance)- has high probability of getting the right answer (both)- has low cost of annotation (expert)
Expert EstimationEstimating Expertise of Labeling Sources
18
over set of categories
class posterior probability of label for sample x being c
the estimated probability of expert k answering for label c
Expert EstimationEstimating Expertise of Labeling Sources
Per-class Reduced Estimation
19
Density Based Sampling for Multi-classification Tasks
20
Label 1
Label 2
Unlabeled
Current Decision Boundary
20
Label 3
Density Based Sampling for Multi-classification Tasks
21
Label 1
Label 2
Unlabeled
Current Decision Boundary
21
Label 3
Density Based Sampling for Multi-classification Tasks
(2) Unknownness(1) Density
(3) Conflictivity
Def: Multi-class Information Density (MCID)
22
Final Value Function
Density Based Sampling for Multi-classification Tasks
Induce Density using a Gaussian Mixture Model
Estimation via an EM Procedure
Each Mixture Sharing the Same Variance
23
So far:New Proactive Learning Algorithmfor Multiple Domain Experts
Multi-class Information Density (MCID) as a query strategy
24
ExperimentsDataset
Simulated Noisy Labelers (except for Diabetes dataset)Narrow Experts: Classifier trained over partially noised dataset (expertise in only a subset of classes)
Meta Expert: Classifier trained over the entire dataset25
Baselines
Best Avg: learner always asks one of the narrow experts that has the highest average P(ans|x, k) Meta : learner always asks meta-oracle (expensive) BestAvg+Meta: joint optimization under uniform reliability assumption (Donmez et al., 2012)
*Narrow: joint optimization using our algorithm *Narrow+Meta: with the presence of an meta oracle as well
Classification Performance Over Iterations
Cost Ratio of Narrow vs Meta: 1:627
Classification Performance for Different Cost Ratios
28
On other datasets
29
Classification Performance vs. Budget Allocated for Expertise Estimation
- Works for both when there are ground truth samples available & via majority votes
- Is able to estimate expertise well enough with ~10% budget
30
Conclusions
• A new proactive learning algorithm with multiple class sensitive labellers accounts better than baselines
• Efficient estimation of expert’s expertise via reduced per-class method
• Multi-class Information Density (MCID) as a new active learning criteria for noised multi-class active learning
31
Future Work
• Theoretical min-max bounds of the proposed algorithm, under different reliabilities and costs of the experts
• Extend the framework to a crowdsourcing scenario with a larger pool of experts
32
Proactive Learning withMultiple Class-Sensitive Labelers
Seungwhan Moon, Jaime Carbonell
Language Technology Institute School of Computer Science, Carnegie Mellon University
DSAA 2014 Conference 10/30/2014
33
MCID Performance
34
Performance when expertise was estimated via Majority Vote
Proactive Learning Algorithm
36
Expertise Estimation
37