Proactive Learning with Multiple Class-Sensitive …14c_DSAA...Proactive Learning with Multiple Class-Sensitive Labelers Seungwhan (Shane) Moon, Jaime Carbonell School of Computer

Proactive Learning withMultiple Class-Sensitive Labelers

Seungwhan (Shane) Moon, Jaime Carbonell

School of Computer Science, Carnegie Mellon University

DSAA 2014 Conference 10/30/2014


Seungwhan (Shane) Moon, Jaime Carbonell

School of Computer Science, Carnegie Mellon University


Unlabeled Data is Abundant

3

Unlabeled Data is Abundant

• Imagine building a Vehicle classifier

4

Scarcity of labeled data

Active Learning

5

Active Learning

6

Query Strategies

• Uncertainty Sampling

• Query by Committee

• Entropy Based Sampling

• Density Weighted Methods

• and more …

7

Uncertainty Sampling

Label 1

Label 2Unlabeled

Current Decision Boundary

8

Uncertainty Sampling

Label 1

Label 2Unlabeled


= most uncertain

9

Assumptions in Traditional Active Learning

• Annotator(s) always give perfect answers (oracle)

• There is no difference in cost for querying different annotators

10

Proactive Learning [Carbonell et. al]

• Relaxes the following assumptions:

• Only a single annotator gives labels

• Annotators always give perfect answers

• Annotators are insensitive to costs

—> utility optimization under budget constraint

11


12

Multiple annotators

They have different labeling accuracy (expertise) incur different cost


13

Key Component: Estimating Labeler Accuracy

Probability of getting a right answer for an unlabeled instance x, and an expert k

Limitation in previous literature on proactive learning

Labeler accuracy is independent of label in multi-class problems

Proactive Learning with Multiple Domain Experts: Anology

Motivation

14

Diagnosis of a patient with unknown disease (uncertainty in data)

Proactive Learning with Multiple Domain Experts: Anology

Motivation

15

Diagnosis of a patient with unknown disease (uncertainty in data)Given multiple physicians with different specialization (multiple class-sensitive experts)

If we know the patient has seemingly cancer symptoms (posterior class probability)

And that oncologist treats cancer issues (estimated labeler accuracy given a specific class)

Better delegate a task to its respective expert

Proactive Learning with Multiple Domain Experts

Problem Formulation (Objective)

Greedy Approximation

:::

16

Proactive Learning with Multiple Domain Experts

Utility Criteria for Greedy Approximation

17

Jointly optimize for an instance and expert pair which

- has high information value V(X) (instance)- has high probability of getting the right answer (both)- has low cost of annotation (expert)

Expert EstimationEstimating Expertise of Labeling Sources

18

over set of categories

class posterior probability of label for sample x being c

the estimated probability of expert k answering for label c

Expert EstimationEstimating Expertise of Labeling Sources

Per-class Reduced Estimation

19

Density Based Sampling for Multi-classification Tasks

20

Label 1

Label 2

Unlabeled


20

Label 3


21

Label 1

Label 2

Unlabeled


21

Label 3


(2) Unknownness(1) Density

(3) Conflictivity

Def: Multi-class Information Density (MCID)

22

Final Value Function


Induce Density using a Gaussian Mixture Model

Estimation via an EM Procedure

Each Mixture Sharing the Same Variance

23

So far:New Proactive Learning Algorithmfor Multiple Domain Experts

Multi-class Information Density (MCID) as a query strategy

24

ExperimentsDataset

Simulated Noisy Labelers (except for Diabetes dataset)Narrow Experts: Classifier trained over partially noised dataset (expertise in only a subset of classes)

Meta Expert: Classifier trained over the entire dataset25

Baselines

Best Avg: learner always asks one of the narrow experts that has the highest average P(ans|x, k) Meta : learner always asks meta-oracle (expensive) BestAvg+Meta: joint optimization under uniform reliability assumption (Donmez et al., 2012)

*Narrow: joint optimization using our algorithm *Narrow+Meta: with the presence of an meta oracle as well

Classification Performance Over Iterations

Cost Ratio of Narrow vs Meta: 1:627

Classification Performance for Different Cost Ratios

28

On other datasets

29

Classification Performance vs. Budget Allocated for Expertise Estimation

- Works for both when there are ground truth samples available & via majority votes

- Is able to estimate expertise well enough with ~10% budget

30

Conclusions

• A new proactive learning algorithm with multiple class sensitive labellers accounts better than baselines

• Efficient estimation of expert’s expertise via reduced per-class method

• Multi-class Information Density (MCID) as a new active learning criteria for noised multi-class active learning

31

Future Work

• Theoretical min-max bounds of the proposed algorithm, under different reliabilities and costs of the experts

• Extend the framework to a crowdsourcing scenario with a larger pool of experts

32


Seungwhan Moon, Jaime Carbonell

Language Technology Institute School of Computer Science, Carnegie Mellon University


33

MCID Performance

34

Performance when expertise was estimated via Majority Vote

Proactive Learning Algorithm

36

Expertise Estimation

37

Documents

Proactive Learning with Multiple Class-Sensitive …14c_DSAA...Proactive Learning with Multiple Class-Sensitive Labelers Seungwhan (Shane) Moon, Jaime Carbonell School of Computer