Max-Margin Latent Variable Models

Max-Margin Latent Variable ModelsM. Pawan Kumar

Max-Margin Latent Variable ModelsM. Pawan Kumar

Daphne KollerBen Packer

Kevin Miller, Rafi Witten,

Tim Tang, Danny Goodman,

Haithem Turki, Dan Preston,

Dan Selsam, Andrej Karpathy

Computer Vision Data

Segmentation

Information

Log

(Size

)

~ 2000


Segmentation

Log

(Size

)

Bounding Box

~ 2000~ 12000

Information


Segmentation

Log

(Size

)

Bounding BoxImage-Level ~ 2000

~ 12000

> 14 M

“Car” “Chair”Information


Segmentation

Log

(Size

)

Bounding BoxImage-Level

Noisy Label~ 2000

~ 12000

> 14 M

> 6 B

Learn with missing information (latent variables)

Information

• Two Types of Problems

• Latent SVM (Background)

• Self-Paced Learning

• Max-Margin Min-Entropy Models

• Discussion

Outline

Annotation MismatchLearn to classify an image

Image x

Annotation a = “Deer”

Mismatch between desired and available annotations

h

Exact value of latent variable is not “important”

Annotation MismatchLearn to classify a DNA sequence

Mismatch between desired and possible annotations

Exact value of latent variable is not “important”

Sequence x

Annotation a {+1, -1}

Latent Variables h

Output MismatchLearn to segment an image

Image x Output y


Bird

(x, a) (a, h)


Mismatch between desired output and available annotations

Exact value of latent variable is important

(x, a) (a, h)

Cow

Output MismatchLearn to classify actions

(x, y)


+“jumping”

x ha = +1

hb


+“jumping”

x ha = -1hb

Mismatch between desired output and available annotations

Exact value of latent variable is important





• Discussion

Outline

Latent SVM

Features (x,a,h)

wT(x,a,h)

Parameters w

Image x

Annotation a = “Deer”

h

Andrews et al, 2001; Smola et al, 2005;Felzenszwalb et al, 2008; Yu and Joachims, 2009

(a(w),h(w)) = maxa,h

Parameter Learning

Score ofGround-Truth

>

Score ofAll Other Outputs

Best Completion of

Parameter Learning

maxh wT(xi,ai,h)

>

wT(x,a,h)

Parameter Learning

maxh wT(xi,ai,h)

≥

wT(x,a,h)

+ Δ(ai,a) - ξi

min ||w||2 + CΣi ξi

Annotation Mismatch

Optimization

Update hi* = argmaxh wT(xi,ai,h)

Update w by solving a convex problem

min ||w||2 + C∑i i

wT(xi,ai,hi*) - wT(xi,a,h)≥ (ai, a) - i

Repeat until convergence





• Discussion

Outline

Self-Paced LearningKumar, Packer and Koller, NIPS 2010

1 + 1 = 2

1/3 + 1/6 = 1/2

eiπ+1 = 0

Math is for losers !!

FAILURE … BAD LOCAL MINIMUM

Self-Paced LearningKumar, Packer and Koller, NIPS 2010

Euler wasa Genius!!

SUCCESS … GOOD LOCAL MINIMUM

1 + 1 = 2

1/3 + 1/6 = 1/2

eiπ+1 = 0

Optimization



min ||w||2 + C∑i i


vi

vi {0,1}

λ λμ

- λ∑i vi


Image Classification

271 images, 6 classes

90/10 train/test split

5 folds

Mammals Dataset


Objective4.4

4.454.5

4.554.6

4.654.7

4.75

Test Error14.5

15

15.5

16

16.5

17

17.5

Kumar, Packer and Koller, NIPS 2010

CCCP

SPL

CCCP

SPL

HOG-Based Model. Dalal and Triggs, 2005


~ 5000 images


5 folds

PASCAL VOC 2007 Dataset

Car vs. Not-Car

Image ClassificationWitten, Miller, Kumar, Packer and Koller, In Preparation

Objective

HOG + Dense SIFT + Dense Color SIFT

SPL+ – Different features choose different “easy” samples

Image ClassificationWitten, Miller, Kumar, Packer and Koller, In Preparation

Mean Average Precision

HOG + Dense SIFT + Dense Color SIFT

SPL+ – Different features choose different “easy” samples

Motif Finding

~ 40,000 sequences


5 folds

UniProbe Dataset

Binding vs. Not-Binding

Motif Finding

Objective0

20406080

100120140

Test Error282930313233343536


CCCP

SPL

CCCP

SPL

Motif + Markov Background Model. Yu and Joachims, 2009

Semantic Segmentation

+

Train - 572 imagesValidation - 53 images

Test - 90 images

Train - 1274 imagesValidation - 225 images

Test - 750 images

Stanford BackgroundVOC Segmentation 2009

Semantic SegmentationImageNetVOC Detection 2009

+

Train - 1564 images Train - 1000 imagesBounding Box Data Image-Level Data

Semantic SegmentationKumar, Turki, Preston and Koller, ICCV 2011

VOC Overlap222324252627282930

SBD Overlap52

52.553

53.554

54.555

55.5

SUP CCCP

SPL

SUPCCCP

SPL

Region-based Model. Gould, Fulton and Koller, 2009

SUP – Supervised Learning (Segmentation Data Only)

Action ClassificationPASCAL VOC 2011

Train – 3000 instances Train - 10000 imagesBounding Box Data Noisy Data

+

Test – 3000 instances

Action ClassificationPacker, Kumar, Tang and Koller, In Preparation

Mean Average Precision60.8

6161.261.461.661.8

6262.262.462.662.8

SUP

CCCP

SPL

Poselet-based Model. Maji, Bourdev and Malik, 2011

Self-Paced Multiple Kernel LearningKumar, Packer and Koller, In Preparation

1 + 1 = 2

1/3 + 1/6 = 1/2

eiπ+1 = 0

Integers

RationalNumbers

ImaginaryNumbers

USE A FIXED MODEL

Kumar, Packer and Koller, In Preparation

1 + 1 = 2

1/3 + 1/6 = 1/2

eiπ+1 = 0

Integers

RationalNumbers

ImaginaryNumbers

ADAPT THE MODEL COMPLEXITY

Self-Paced Multiple Kernel Learning

Optimization



min ||w||2 + C∑i i


vi

vi {0,1}

λ λμ

- λ∑i vi


Kij = (xi,ai,hi)T (xj,aj,hj) K = Σk ck Kk

^

and c




5 folds

Mammals Dataset


Objective0

0.2

0.4

0.6

0.8

1

Test Error02468

1012141618

Kumar, Packer and Koller, In Preparation

FIXED

SPMKL

FIXED

SPMKL


Motif Finding

~ 40,000 sequences


5 folds

UniProbe Dataset


Motif Finding

Objective69707172737475767778

Test Error8.5

9

9.5

10

10.5

11

11.5


FIXED

SPMKL

FIXED

SPMKL






• Discussion

Outline

0.00 0.00 0.250.00 0.25 0.000.00 0.00 0.25

Pr(a,h|x) = exp( wT(x,a,h))Z(x)

Pr(a1,h|x)

MAP Inference

0.00 0.00 0.250.00 0.25 0.000.00 0.00 0.25

Pr(a1,h|x)0.00 0.00 0.010.00 0.24 0.000.00 0.00 0.00

Pr(a2,h|x)

MAP Inference

mina,h – log (Pr(a,h|x))

Value of latent variable?

Pr(a,h|x) = exp( wT(x,a,h))Z(x)

mina – log (Pr(a|x))

Min-Entropy Inference

+ Hα (Pr(h|a,x))

mina Hα(Q(a; x, w))

Q(a; x, w) = Set of all {Pr(a,h|x)}

Renyi entropy of generalized distribution

min ||w||2 + C∑i i

Hα(Q(a; x, w))- Hα(Q(ai; x, w)) ≥ (ai, a) - i

i ≥ 0

Like latent SVM, minimizes (ai, ai(w))

In fact, when α = ∞...

Max-Margin Min-Entropy ModelsMiller, Kumar, Packer, Goodman and Koller, AISTATS 2012

min ||w||2 + C∑i i

maxhwT(x,ai,h)-maxhwT(x,a,h) ≥ (ai, a) - i

i ≥ 0

In fact, when α = ∞... Latent SVM

Max-Margin Min-Entropy Models

Like latent SVM, minimizes (ai, ai(w))

Miller, Kumar, Packer, Goodman and Koller, AISTATS 2012




5 folds

Mammals Dataset

Image ClassificationMiller, Kumar, Packer, Goodman and Koller, AISTATS 2012






Motif Finding

~ 40,000 sequences


5 folds

UniProbe Dataset


Motif FindingMiller, Kumar, Packer, Goodman and Koller, AISTATS 2012






• Discussion

Outline

Very Large Datasets

• Initialize parameters using supervised data

• Impute latent variables (inference)

• Select easy samples (very efficient)

• Update parameters using incremental SVM

• Refine efficiently with proximal regularization

Output MismatchΔ(a,h,a(w),h(w))Σh Prθ(h|a,x) + A(θ)

C. R. Rao’s Relative Quadratic Entropy

Minimize over w and θ



Minimize over w

(a1,h) (a2,h)

Pr θ

(h,a

|x)



Minimize over w

(a1,h)

Pr θ

(h,a

|x)

(a2,h)



Minimize over θ

(a1,h) (a2,h)

Pr θ

(h,a

|x)



Minimize over θ

(a1,h) (a2,h)

Pr θ

(h,a

|x)



Minimize over θ

(a1,h) (a2,h)

Pr θ

(h,a

|x)

Questions?

Documents

Max-Margin Latent Variable Models