65
Max-Margin Latent Variable Models M. Pawan Kumar

Max-Margin Latent Variable Models M. Pawan Kumar

Embed Size (px)

Citation preview

Page 1: Max-Margin Latent Variable Models M. Pawan Kumar

Max-Margin Latent Variable Models

M. Pawan Kumar

Page 2: Max-Margin Latent Variable Models M. Pawan Kumar

Max-Margin Latent Variable Models

M. Pawan Kumar

Daphne KollerBen Packer

Kevin Miller, Rafi Witten,

Tim Tang, Danny Goodman,

Haithem Turki, Dan Preston,

Dan Selsam, Andrej Karpathy

Page 3: Max-Margin Latent Variable Models M. Pawan Kumar

Computer Vision Data

Segmentation

Information

Log (

Siz

e)

~ 2000

Page 4: Max-Margin Latent Variable Models M. Pawan Kumar

Computer Vision Data

Segmentation

Log (

Siz

e)

Bounding Box

~ 2000~ 12000

Information

Page 5: Max-Margin Latent Variable Models M. Pawan Kumar

Computer Vision Data

Segmentation

Log (

Siz

e)

Bounding Box

Image-Level~ 2000

~ 12000

> 14 M

“Car” “Chair”Information

Page 6: Max-Margin Latent Variable Models M. Pawan Kumar

Computer Vision Data

Segmentation

Log (

Siz

e)

Bounding Box

Image-Level

Noisy Label~ 2000

~ 12000

> 14 M

> 6 B

Learn with missing information (latent variables)

Information

Page 7: Max-Margin Latent Variable Models M. Pawan Kumar

• Two Types of Problems

• Latent SVM (Background)

• Self-Paced Learning

• Max-Margin Min-Entropy Models

• Discussion

Outline

Page 8: Max-Margin Latent Variable Models M. Pawan Kumar

Annotation MismatchLearn to classify an image

Image x

Annotation a = “Deer”

Mismatch between desired and available annotations

h

Exact value of latent variable is not “important”

Page 9: Max-Margin Latent Variable Models M. Pawan Kumar

Annotation MismatchLearn to classify a DNA sequence

Mismatch between desired and possible annotations

Exact value of latent variable is not “important”

Sequence x

Annotation a {+1, -1}

Latent Variables h

Page 10: Max-Margin Latent Variable Models M. Pawan Kumar

Output MismatchLearn to segment an image

Image x Output y

Page 11: Max-Margin Latent Variable Models M. Pawan Kumar

Output MismatchLearn to segment an image

Bird

(x, a) (a, h)

Page 12: Max-Margin Latent Variable Models M. Pawan Kumar

Output MismatchLearn to segment an image

Mismatch between desired output and available annotations

Exact value of latent variable is important

(x, a) (a, h)

Cow

Page 13: Max-Margin Latent Variable Models M. Pawan Kumar

Output MismatchLearn to classify actions

(x, y)

Page 14: Max-Margin Latent Variable Models M. Pawan Kumar

Output MismatchLearn to classify actions

+“jumping”

x ha = +1

hb

Page 15: Max-Margin Latent Variable Models M. Pawan Kumar

Output MismatchLearn to classify actions

+“jumping”

x ha = -1hb

Mismatch between desired output and available annotations

Exact value of latent variable is important

Page 16: Max-Margin Latent Variable Models M. Pawan Kumar

• Two Types of Problems

• Latent SVM (Background)

• Self-Paced Learning

• Max-Margin Min-Entropy Models

• Discussion

Outline

Page 17: Max-Margin Latent Variable Models M. Pawan Kumar

Latent SVM

Features (x,a,h)

wT(x,a,h)

Parameters w

Image x

Annotation a = “Deer”

h

Andrews et al, 2001; Smola et al, 2005;Felzenszwalb et al, 2008; Yu and Joachims, 2009

(a(w),h(w)) = maxa,h

Page 18: Max-Margin Latent Variable Models M. Pawan Kumar

Parameter Learning

Score ofGround-Truth

>

Score ofAll Other Outputs

Best Completion of

Page 19: Max-Margin Latent Variable Models M. Pawan Kumar

Parameter Learning

maxh wT(xi,ai,h)

>

wT(x,a,h)

Page 20: Max-Margin Latent Variable Models M. Pawan Kumar

Parameter Learning

maxh wT(xi,ai,h)

wT(x,a,h)

+ Δ(ai,a) - ξi

min ||w||2 + CΣi ξi

Annotation Mismatch

Page 21: Max-Margin Latent Variable Models M. Pawan Kumar

Optimization

Update hi* = argmaxh wT(xi,ai,h)

Update w by solving a convex problem

min ||w||2 + C∑i i

wT(xi,ai,hi*) - wT(xi,a,h)≥ (ai, a) - i

Repeat until convergence

Page 22: Max-Margin Latent Variable Models M. Pawan Kumar

• Two Types of Problems

• Latent SVM (Background)

• Self-Paced Learning

• Max-Margin Min-Entropy Models

• Discussion

Outline

Page 23: Max-Margin Latent Variable Models M. Pawan Kumar

Self-Paced LearningKumar, Packer and Koller, NIPS 2010

1 + 1 = 2

1/3 + 1/6 = 1/2

eiπ+1 = 0

Math is for losers !!

FAILURE … BAD LOCAL MINIMUM

Page 24: Max-Margin Latent Variable Models M. Pawan Kumar

Self-Paced LearningKumar, Packer and Koller, NIPS 2010

Euler wasa Genius!!

SUCCESS … GOOD LOCAL MINIMUM

1 + 1 = 2

1/3 + 1/6 = 1/2

eiπ+1 = 0

Page 25: Max-Margin Latent Variable Models M. Pawan Kumar

Optimization

Update hi* = argmaxh wT(xi,ai,h)

Update w by solving a convex problem

min ||w||2 + C∑i i

Repeat until convergence

vi

vi {0,1}

λ λμ

- λ∑i vi

wT(xi,ai,hi*) - wT(xi,a,h)≥ (ai, a) - i

Page 26: Max-Margin Latent Variable Models M. Pawan Kumar

Image Classification

271 images, 6 classes

90/10 train/test split

5 folds

Mammals Dataset

Page 27: Max-Margin Latent Variable Models M. Pawan Kumar

Image Classification

Objective4.4

4.45

4.5

4.55

4.6

4.65

4.7

4.75

Test Error14.5

15

15.5

16

16.5

17

17.5

Kumar, Packer and Koller, NIPS 2010

CCCP

SPL

CCCP

SPL

HOG-Based Model. Dalal and Triggs, 2005

Page 28: Max-Margin Latent Variable Models M. Pawan Kumar

Image Classification

~ 5000 images

50/50 train/test split

5 folds

PASCAL VOC 2007 Dataset

Car vs. Not-Car

Page 29: Max-Margin Latent Variable Models M. Pawan Kumar

Image ClassificationWitten, Miller, Kumar, Packer and Koller, In Preparation

Objective

HOG + Dense SIFT + Dense Color SIFT

SPL+ – Different features choose different “easy” samples

Page 30: Max-Margin Latent Variable Models M. Pawan Kumar

Image ClassificationWitten, Miller, Kumar, Packer and Koller, In Preparation

Mean Average Precision

HOG + Dense SIFT + Dense Color SIFT

SPL+ – Different features choose different “easy” samples

Page 31: Max-Margin Latent Variable Models M. Pawan Kumar

Motif Finding

~ 40,000 sequences

50/50 train/test split

5 folds

UniProbe Dataset

Binding vs. Not-Binding

Page 32: Max-Margin Latent Variable Models M. Pawan Kumar

Motif Finding

Objective0

20

40

60

80

100

120

140

Test Error282930313233343536

Kumar, Packer and Koller, NIPS 2010

CCCP

SPL

CCCP

SPL

Motif + Markov Background Model. Yu and Joachims, 2009

Page 33: Max-Margin Latent Variable Models M. Pawan Kumar

Semantic Segmentation

+

Train - 572 imagesValidation - 53 images

Test - 90 images

Train - 1274 imagesValidation - 225 images

Test - 750 images

Stanford BackgroundVOC Segmentation 2009

Page 34: Max-Margin Latent Variable Models M. Pawan Kumar

Semantic SegmentationImageNetVOC Detection 2009

+

Train - 1564 images Train - 1000 images

Bounding Box Data Image-Level Data

Page 35: Max-Margin Latent Variable Models M. Pawan Kumar

Semantic SegmentationKumar, Turki, Preston and Koller, ICCV 2011

VOC Overlap222324252627282930

SBD Overlap52

52.5

53

53.5

54

54.5

55

55.5

SUP CCCP

SPL

SUPCCCP

SPL

Region-based Model. Gould, Fulton and Koller, 2009

SUP – Supervised Learning (Segmentation Data Only)

Page 36: Max-Margin Latent Variable Models M. Pawan Kumar

Action ClassificationPASCAL VOC 2011

Train – 3000 instances Train - 10000 images

Bounding Box Data Noisy Data

+

Test – 3000 instances

Page 37: Max-Margin Latent Variable Models M. Pawan Kumar

Action ClassificationPacker, Kumar, Tang and Koller, In Preparation

Mean Average Precision60.8

6161.261.461.661.8

6262.262.462.662.8

SUP

CCCP

SPL

Poselet-based Model. Maji, Bourdev and Malik, 2011

Page 38: Max-Margin Latent Variable Models M. Pawan Kumar

Self-Paced Multiple Kernel LearningKumar, Packer and Koller, In Preparation

1 + 1 = 2

1/3 + 1/6 = 1/2

eiπ+1 = 0

Integers

RationalNumbers

ImaginaryNumbers

USE A FIXED MODEL

Page 39: Max-Margin Latent Variable Models M. Pawan Kumar

Kumar, Packer and Koller, In Preparation

1 + 1 = 2

1/3 + 1/6 = 1/2

eiπ+1 = 0

Integers

RationalNumbers

ImaginaryNumbers

ADAPT THE MODEL COMPLEXITY

Self-Paced Multiple Kernel Learning

Page 40: Max-Margin Latent Variable Models M. Pawan Kumar

Optimization

Update hi* = argmaxh wT(xi,ai,h)

Update w by solving a convex problem

min ||w||2 + C∑i i

Repeat until convergence

vi

vi {0,1}

λ λμ

- λ∑i vi

wT(xi,ai,hi*) - wT(xi,a,h)≥ (ai, a) - i

Kij = (xi,ai,hi)T (xj,aj,hj) K = Σk ck Kk

^

and c

Page 41: Max-Margin Latent Variable Models M. Pawan Kumar

Image Classification

271 images, 6 classes

90/10 train/test split

5 folds

Mammals Dataset

Page 42: Max-Margin Latent Variable Models M. Pawan Kumar

Image Classification

Objective0

0.2

0.4

0.6

0.8

1

Test Error02468

1012141618

Kumar, Packer and Koller, In Preparation

FIXED

SPMKL

FIXED

SPMKL

HOG-Based Model. Dalal and Triggs, 2005

Page 43: Max-Margin Latent Variable Models M. Pawan Kumar

Motif Finding

~ 40,000 sequences

50/50 train/test split

5 folds

UniProbe Dataset

Binding vs. Not-Binding

Page 44: Max-Margin Latent Variable Models M. Pawan Kumar

Motif Finding

Objective69707172737475767778

Test Error8.5

9

9.5

10

10.5

11

11.5

Kumar, Packer and Koller, NIPS 2010

FIXED

SPMKL

FIXED

SPMKL

Motif + Markov Background Model. Yu and Joachims, 2009

Page 45: Max-Margin Latent Variable Models M. Pawan Kumar

• Two Types of Problems

• Latent SVM (Background)

• Self-Paced Learning

• Max-Margin Min-Entropy Models

• Discussion

Outline

Page 46: Max-Margin Latent Variable Models M. Pawan Kumar

0.00 0.00 0.250.00 0.25 0.000.00 0.00 0.25

Pr(a,h|x) = exp( wT(x,a,h))

Z(x)

Pr(a1,h|x)

MAP Inference

Page 47: Max-Margin Latent Variable Models M. Pawan Kumar

0.00 0.00 0.250.00 0.25 0.000.00 0.00 0.25

Pr(a1,h|x)0.00 0.00 0.010.00 0.24 0.000.00 0.00 0.00

Pr(a2,h|x)

MAP Inference

mina,h – log (Pr(a,h|x))

Value of latent variable?

Pr(a,h|x) = exp( wT(x,a,h))

Z(x)

Page 48: Max-Margin Latent Variable Models M. Pawan Kumar

mina – log (Pr(a|x))

Min-Entropy Inference

+ Hα (Pr(h|a,x))

mina Hα(Q(a; x, w))

Q(a; x, w) = Set of all {Pr(a,h|x)}

Renyi entropy of generalized distribution

Page 49: Max-Margin Latent Variable Models M. Pawan Kumar

min ||w||2 + C∑i i

Hα(Q(a; x, w))- Hα(Q(ai; x, w)) ≥ (ai, a) - i

i ≥ 0

Like latent SVM, minimizes (ai, ai(w))

In fact, when α = ∞...

Max-Margin Min-Entropy ModelsMiller, Kumar, Packer, Goodman and Koller, AISTATS 2012

Page 50: Max-Margin Latent Variable Models M. Pawan Kumar

min ||w||2 + C∑i i

maxhwT(x,ai,h)-maxhwT(x,a,h) ≥ (ai, a) - i

i ≥ 0

In fact, when α = ∞... Latent SVM

Max-Margin Min-Entropy Models

Like latent SVM, minimizes (ai, ai(w))

Miller, Kumar, Packer, Goodman and Koller, AISTATS 2012

Page 51: Max-Margin Latent Variable Models M. Pawan Kumar

Image Classification

271 images, 6 classes

90/10 train/test split

5 folds

Mammals Dataset

Page 52: Max-Margin Latent Variable Models M. Pawan Kumar

Image ClassificationMiller, Kumar, Packer, Goodman and Koller, AISTATS 2012

HOG-Based Model. Dalal and Triggs, 2005

Page 53: Max-Margin Latent Variable Models M. Pawan Kumar

Image ClassificationMiller, Kumar, Packer, Goodman and Koller, AISTATS 2012

HOG-Based Model. Dalal and Triggs, 2005

Page 54: Max-Margin Latent Variable Models M. Pawan Kumar

Image ClassificationMiller, Kumar, Packer, Goodman and Koller, AISTATS 2012

HOG-Based Model. Dalal and Triggs, 2005

Page 55: Max-Margin Latent Variable Models M. Pawan Kumar

Motif Finding

~ 40,000 sequences

50/50 train/test split

5 folds

UniProbe Dataset

Binding vs. Not-Binding

Page 56: Max-Margin Latent Variable Models M. Pawan Kumar

Motif FindingMiller, Kumar, Packer, Goodman and Koller, AISTATS 2012

Motif + Markov Background Model. Yu and Joachims, 2009

Page 57: Max-Margin Latent Variable Models M. Pawan Kumar

• Two Types of Problems

• Latent SVM (Background)

• Self-Paced Learning

• Max-Margin Min-Entropy Models

• Discussion

Outline

Page 58: Max-Margin Latent Variable Models M. Pawan Kumar

Very Large Datasets

• Initialize parameters using supervised data

• Impute latent variables (inference)

• Select easy samples (very efficient)

• Update parameters using incremental SVM

• Refine efficiently with proximal regularization

Page 59: Max-Margin Latent Variable Models M. Pawan Kumar

Output Mismatch

Δ(a,h,a(w),h(w))Σh Prθ(h|a,x) + A(θ)

C. R. Rao’s Relative Quadratic Entropy

Minimize over w and θ

Page 60: Max-Margin Latent Variable Models M. Pawan Kumar

Output Mismatch

Δ(a,h,a(w),h(w))Σh Prθ(h|a,x) + A(θ)

C. R. Rao’s Relative Quadratic Entropy

Minimize over w

(a1,h) (a2,h)

Pr θ

(h,a

|x)

Page 61: Max-Margin Latent Variable Models M. Pawan Kumar

Output Mismatch

Δ(a,h,a(w),h(w))Σh Prθ(h|a,x) + A(θ)

C. R. Rao’s Relative Quadratic Entropy

Minimize over w

(a1,h)

Pr θ

(h,a

|x)

(a2,h)

Page 62: Max-Margin Latent Variable Models M. Pawan Kumar

Output Mismatch

Δ(a,h,a(w),h(w))Σh Prθ(h|a,x) + A(θ)

C. R. Rao’s Relative Quadratic Entropy

Minimize over θ

(a1,h) (a2,h)

Pr θ

(h,a

|x)

Page 63: Max-Margin Latent Variable Models M. Pawan Kumar

Output Mismatch

Δ(a,h,a(w),h(w))Σh Prθ(h|a,x) + A(θ)

C. R. Rao’s Relative Quadratic Entropy

Minimize over θ

(a1,h) (a2,h)

Pr θ

(h,a

|x)

Page 64: Max-Margin Latent Variable Models M. Pawan Kumar

Output Mismatch

Δ(a,h,a(w),h(w))Σh Prθ(h|a,x) + A(θ)

C. R. Rao’s Relative Quadratic Entropy

Minimize over θ

(a1,h) (a2,h)

Pr θ

(h,a

|x)

Page 65: Max-Margin Latent Variable Models M. Pawan Kumar

Questions?