65
Max-Margin Latent Variable Models M. Pawan Kumar

Max-Margin Latent Variable Models

  • Upload
    aulii

  • View
    102

  • Download
    0

Embed Size (px)

DESCRIPTION

Max-Margin Latent Variable Models. M. Pawan Kumar. Max-Margin Latent Variable Models. M. Pawan Kumar. Kevin Miller, Rafi Witten, Tim Tang, Danny Goodman, Haithem Turki , Dan Preston, Dan Selsam , Andrej Karpathy. Ben Packer. Daphne Koller. Computer Vision Data. Log (Size). - PowerPoint PPT Presentation

Citation preview

Max-Margin Latent Variable ModelsM. Pawan Kumar

Max-Margin Latent Variable ModelsM. Pawan Kumar

Daphne KollerBen Packer

Kevin Miller, Rafi Witten,

Tim Tang, Danny Goodman,

Haithem Turki, Dan Preston,

Dan Selsam, Andrej Karpathy

Computer Vision Data

Segmentation

Information

Log

(Size

)

~ 2000

Computer Vision Data

Segmentation

Log

(Size

)

Bounding Box

~ 2000~ 12000

Information

Computer Vision Data

Segmentation

Log

(Size

)

Bounding BoxImage-Level ~ 2000

~ 12000

> 14 M

“Car” “Chair”Information

Computer Vision Data

Segmentation

Log

(Size

)

Bounding BoxImage-Level

Noisy Label~ 2000

~ 12000

> 14 M

> 6 B

Learn with missing information (latent variables)

Information

• Two Types of Problems

• Latent SVM (Background)

• Self-Paced Learning

• Max-Margin Min-Entropy Models

• Discussion

Outline

Annotation MismatchLearn to classify an image

Image x

Annotation a = “Deer”

Mismatch between desired and available annotations

h

Exact value of latent variable is not “important”

Annotation MismatchLearn to classify a DNA sequence

Mismatch between desired and possible annotations

Exact value of latent variable is not “important”

Sequence x

Annotation a {+1, -1}

Latent Variables h

Output MismatchLearn to segment an image

Image x Output y

Output MismatchLearn to segment an image

Bird

(x, a) (a, h)

Output MismatchLearn to segment an image

Mismatch between desired output and available annotations

Exact value of latent variable is important

(x, a) (a, h)

Cow

Output MismatchLearn to classify actions

(x, y)

Output MismatchLearn to classify actions

+“jumping”

x ha = +1

hb

Output MismatchLearn to classify actions

+“jumping”

x ha = -1hb

Mismatch between desired output and available annotations

Exact value of latent variable is important

• Two Types of Problems

• Latent SVM (Background)

• Self-Paced Learning

• Max-Margin Min-Entropy Models

• Discussion

Outline

Latent SVM

Features (x,a,h)

wT(x,a,h)

Parameters w

Image x

Annotation a = “Deer”

h

Andrews et al, 2001; Smola et al, 2005;Felzenszwalb et al, 2008; Yu and Joachims, 2009

(a(w),h(w)) = maxa,h

Parameter Learning

Score ofGround-Truth

>

Score ofAll Other Outputs

Best Completion of

Parameter Learning

maxh wT(xi,ai,h)

>

wT(x,a,h)

Parameter Learning

maxh wT(xi,ai,h)

wT(x,a,h)

+ Δ(ai,a) - ξi

min ||w||2 + CΣi ξi

Annotation Mismatch

Optimization

Update hi* = argmaxh wT(xi,ai,h)

Update w by solving a convex problem

min ||w||2 + C∑i i

wT(xi,ai,hi*) - wT(xi,a,h)≥ (ai, a) - i

Repeat until convergence

• Two Types of Problems

• Latent SVM (Background)

• Self-Paced Learning

• Max-Margin Min-Entropy Models

• Discussion

Outline

Self-Paced LearningKumar, Packer and Koller, NIPS 2010

1 + 1 = 2

1/3 + 1/6 = 1/2

eiπ+1 = 0

Math is for losers !!

FAILURE … BAD LOCAL MINIMUM

Self-Paced LearningKumar, Packer and Koller, NIPS 2010

Euler wasa Genius!!

SUCCESS … GOOD LOCAL MINIMUM

1 + 1 = 2

1/3 + 1/6 = 1/2

eiπ+1 = 0

Optimization

Update hi* = argmaxh wT(xi,ai,h)

Update w by solving a convex problem

min ||w||2 + C∑i i

Repeat until convergence

vi

vi {0,1}

λ λμ

- λ∑i vi

wT(xi,ai,hi*) - wT(xi,a,h)≥ (ai, a) - i

Image Classification

271 images, 6 classes

90/10 train/test split

5 folds

Mammals Dataset

Image Classification

Objective4.4

4.454.5

4.554.6

4.654.7

4.75

Test Error14.5

15

15.5

16

16.5

17

17.5

Kumar, Packer and Koller, NIPS 2010

CCCP

SPL

CCCP

SPL

HOG-Based Model. Dalal and Triggs, 2005

Image Classification

~ 5000 images

50/50 train/test split

5 folds

PASCAL VOC 2007 Dataset

Car vs. Not-Car

Image ClassificationWitten, Miller, Kumar, Packer and Koller, In Preparation

Objective

HOG + Dense SIFT + Dense Color SIFT

SPL+ – Different features choose different “easy” samples

Image ClassificationWitten, Miller, Kumar, Packer and Koller, In Preparation

Mean Average Precision

HOG + Dense SIFT + Dense Color SIFT

SPL+ – Different features choose different “easy” samples

Motif Finding

~ 40,000 sequences

50/50 train/test split

5 folds

UniProbe Dataset

Binding vs. Not-Binding

Motif Finding

Objective0

20406080

100120140

Test Error282930313233343536

Kumar, Packer and Koller, NIPS 2010

CCCP

SPL

CCCP

SPL

Motif + Markov Background Model. Yu and Joachims, 2009

Semantic Segmentation

+

Train - 572 imagesValidation - 53 images

Test - 90 images

Train - 1274 imagesValidation - 225 images

Test - 750 images

Stanford BackgroundVOC Segmentation 2009

Semantic SegmentationImageNetVOC Detection 2009

+

Train - 1564 images Train - 1000 imagesBounding Box Data Image-Level Data

Semantic SegmentationKumar, Turki, Preston and Koller, ICCV 2011

VOC Overlap222324252627282930

SBD Overlap52

52.553

53.554

54.555

55.5

SUP CCCP

SPL

SUPCCCP

SPL

Region-based Model. Gould, Fulton and Koller, 2009

SUP – Supervised Learning (Segmentation Data Only)

Action ClassificationPASCAL VOC 2011

Train – 3000 instances Train - 10000 imagesBounding Box Data Noisy Data

+

Test – 3000 instances

Action ClassificationPacker, Kumar, Tang and Koller, In Preparation

Mean Average Precision60.8

6161.261.461.661.8

6262.262.462.662.8

SUP

CCCP

SPL

Poselet-based Model. Maji, Bourdev and Malik, 2011

Self-Paced Multiple Kernel LearningKumar, Packer and Koller, In Preparation

1 + 1 = 2

1/3 + 1/6 = 1/2

eiπ+1 = 0

Integers

RationalNumbers

ImaginaryNumbers

USE A FIXED MODEL

Kumar, Packer and Koller, In Preparation

1 + 1 = 2

1/3 + 1/6 = 1/2

eiπ+1 = 0

Integers

RationalNumbers

ImaginaryNumbers

ADAPT THE MODEL COMPLEXITY

Self-Paced Multiple Kernel Learning

Optimization

Update hi* = argmaxh wT(xi,ai,h)

Update w by solving a convex problem

min ||w||2 + C∑i i

Repeat until convergence

vi

vi {0,1}

λ λμ

- λ∑i vi

wT(xi,ai,hi*) - wT(xi,a,h)≥ (ai, a) - i

Kij = (xi,ai,hi)T (xj,aj,hj) K = Σk ck Kk

^

and c

Image Classification

271 images, 6 classes

90/10 train/test split

5 folds

Mammals Dataset

Image Classification

Objective0

0.2

0.4

0.6

0.8

1

Test Error02468

1012141618

Kumar, Packer and Koller, In Preparation

FIXED

SPMKL

FIXED

SPMKL

HOG-Based Model. Dalal and Triggs, 2005

Motif Finding

~ 40,000 sequences

50/50 train/test split

5 folds

UniProbe Dataset

Binding vs. Not-Binding

Motif Finding

Objective69707172737475767778

Test Error8.5

9

9.5

10

10.5

11

11.5

Kumar, Packer and Koller, NIPS 2010

FIXED

SPMKL

FIXED

SPMKL

Motif + Markov Background Model. Yu and Joachims, 2009

• Two Types of Problems

• Latent SVM (Background)

• Self-Paced Learning

• Max-Margin Min-Entropy Models

• Discussion

Outline

0.00 0.00 0.250.00 0.25 0.000.00 0.00 0.25

Pr(a,h|x) = exp( wT(x,a,h))Z(x)

Pr(a1,h|x)

MAP Inference

0.00 0.00 0.250.00 0.25 0.000.00 0.00 0.25

Pr(a1,h|x)0.00 0.00 0.010.00 0.24 0.000.00 0.00 0.00

Pr(a2,h|x)

MAP Inference

mina,h – log (Pr(a,h|x))

Value of latent variable?

Pr(a,h|x) = exp( wT(x,a,h))Z(x)

mina – log (Pr(a|x))

Min-Entropy Inference

+ Hα (Pr(h|a,x))

mina Hα(Q(a; x, w))

Q(a; x, w) = Set of all {Pr(a,h|x)}

Renyi entropy of generalized distribution

min ||w||2 + C∑i i

Hα(Q(a; x, w))- Hα(Q(ai; x, w)) ≥ (ai, a) - i

i ≥ 0

Like latent SVM, minimizes (ai, ai(w))

In fact, when α = ∞...

Max-Margin Min-Entropy ModelsMiller, Kumar, Packer, Goodman and Koller, AISTATS 2012

min ||w||2 + C∑i i

maxhwT(x,ai,h)-maxhwT(x,a,h) ≥ (ai, a) - i

i ≥ 0

In fact, when α = ∞... Latent SVM

Max-Margin Min-Entropy Models

Like latent SVM, minimizes (ai, ai(w))

Miller, Kumar, Packer, Goodman and Koller, AISTATS 2012

Image Classification

271 images, 6 classes

90/10 train/test split

5 folds

Mammals Dataset

Image ClassificationMiller, Kumar, Packer, Goodman and Koller, AISTATS 2012

HOG-Based Model. Dalal and Triggs, 2005

Image ClassificationMiller, Kumar, Packer, Goodman and Koller, AISTATS 2012

HOG-Based Model. Dalal and Triggs, 2005

Image ClassificationMiller, Kumar, Packer, Goodman and Koller, AISTATS 2012

HOG-Based Model. Dalal and Triggs, 2005

Motif Finding

~ 40,000 sequences

50/50 train/test split

5 folds

UniProbe Dataset

Binding vs. Not-Binding

Motif FindingMiller, Kumar, Packer, Goodman and Koller, AISTATS 2012

Motif + Markov Background Model. Yu and Joachims, 2009

• Two Types of Problems

• Latent SVM (Background)

• Self-Paced Learning

• Max-Margin Min-Entropy Models

• Discussion

Outline

Very Large Datasets

• Initialize parameters using supervised data

• Impute latent variables (inference)

• Select easy samples (very efficient)

• Update parameters using incremental SVM

• Refine efficiently with proximal regularization

Output MismatchΔ(a,h,a(w),h(w))Σh Prθ(h|a,x) + A(θ)

C. R. Rao’s Relative Quadratic Entropy

Minimize over w and θ

Output MismatchΔ(a,h,a(w),h(w))Σh Prθ(h|a,x) + A(θ)

C. R. Rao’s Relative Quadratic Entropy

Minimize over w

(a1,h) (a2,h)

Pr θ

(h,a

|x)

Output MismatchΔ(a,h,a(w),h(w))Σh Prθ(h|a,x) + A(θ)

C. R. Rao’s Relative Quadratic Entropy

Minimize over w

(a1,h)

Pr θ

(h,a

|x)

(a2,h)

Output MismatchΔ(a,h,a(w),h(w))Σh Prθ(h|a,x) + A(θ)

C. R. Rao’s Relative Quadratic Entropy

Minimize over θ

(a1,h) (a2,h)

Pr θ

(h,a

|x)

Output MismatchΔ(a,h,a(w),h(w))Σh Prθ(h|a,x) + A(θ)

C. R. Rao’s Relative Quadratic Entropy

Minimize over θ

(a1,h) (a2,h)

Pr θ

(h,a

|x)

Output MismatchΔ(a,h,a(w),h(w))Σh Prθ(h|a,x) + A(θ)

C. R. Rao’s Relative Quadratic Entropy

Minimize over θ

(a1,h) (a2,h)

Pr θ

(h,a

|x)

Questions?