32
Keywords to Visual Categories: Multiple-Instance Learning for Weakly Supervised Object Categorization Sudheendra Vijayanarasimhan Kristen Grauman Dept. of Computer Sciences University of Texas at Austin

Sudheendra Vijayanarasimhan Kristen Grauman Dept. of Computer Sciences

  • Upload
    read

  • View
    29

  • Download
    3

Embed Size (px)

DESCRIPTION

Keywords to Visual Categories: Multiple-Instance Learning for Weakly Supervised Object Categorization. Sudheendra Vijayanarasimhan Kristen Grauman Dept. of Computer Sciences University of Texas at Austin. Learning about images from keyword-based search. - PowerPoint PPT Presentation

Citation preview

Page 1: Sudheendra Vijayanarasimhan Kristen Grauman Dept. of Computer Sciences

Keywords to Visual Categories:Multiple-Instance Learning for Weakly

Supervised Object Categorization

Sudheendra VijayanarasimhanKristen Grauman

Dept. of Computer SciencesUniversity of Texas at Austin

Page 2: Sudheendra Vijayanarasimhan Kristen Grauman Dept. of Computer Sciences

Learning about images from keyword-based search

Search engines already index images based on their proximity to keywords

+ easy to collect examples automatically

+ lots of data, efficiently indexed

Page 3: Sudheendra Vijayanarasimhan Kristen Grauman Dept. of Computer Sciences

The Challenge

• Lots of images unrelated to the category could be returned

• More variety in terms of viewpoint, illumination, scale, etc

• Could be as few as one “good” image from which anything about the category can be learned

Example results for Google image search for

“Face”

Example images from a labeled dataset

Page 4: Sudheendra Vijayanarasimhan Kristen Grauman Dept. of Computer Sciences

Related work

Cluster to find visual themes (e.g., with topic models such as pLSA, HDA) [Sivic et al. 2005, Fergus et al. 2005, Li et al. 2007]

Apply models known to work well with correctly labeled data [Fergus et al. 2004, Schroff et al. 2007]

Page 5: Sudheendra Vijayanarasimhan Kristen Grauman Dept. of Computer Sciences

Our approach

• A multiple-instance visual category learning scenario to directly obtain discriminative models for specified categories from sets of examples

– Assumes that as little as one example in a set or “bag” of images could be positive

– Obtains a large-margin solution with constraints to accommodate this assumption

– Iteratively improve multiple-instance classifier by automatically refining the representation

[Vijayanarasimhan & Grauman CVPR 2008]

Page 6: Sudheendra Vijayanarasimhan Kristen Grauman Dept. of Computer Sciences

Multiple-Instance Learning (MIL)

Traditional supervised

learning

Multiple-instance learning

positive

negative positive bags

[Dietterich et al. 1997]

negative bags

Page 7: Sudheendra Vijayanarasimhan Kristen Grauman Dept. of Computer Sciences

MIL for Visual Categorization

• Obtain sets or bags of images from independent sources

• Each set should contain at least one good example of the category being learned

• Enforce MIL constraint and obtain a classifier such that in each bag at least one example is classified as positive

Page 8: Sudheendra Vijayanarasimhan Kristen Grauman Dept. of Computer Sciences

Sparse MIL

Let denote the set of positive bags

and denote the set of negative bags

Let denote a bag, denote an instance, and let

be all negative instances.

Page 9: Sudheendra Vijayanarasimhan Kristen Grauman Dept. of Computer Sciences

Sparse MIL

where

To begin, we solve a large-margin decision problem with constraints as suggested in [Bunescu & Mooney, 2007]:

Page 10: Sudheendra Vijayanarasimhan Kristen Grauman Dept. of Computer Sciences

Iterative MIL category learning

positive bags

negative bags

Compute optimal hyper-plane with sparse MIL

positive bags

negative bags

Re-weight positive instances

Page 11: Sudheendra Vijayanarasimhan Kristen Grauman Dept. of Computer Sciences

Training phase

face

faccia

visage

Positive bag 1

Positive bag 2

Positive bag N

Negative bag 1

Negative bag 2

Negative bag N

Category model

sMIL-SVM

Bag of words (1000 words)

SIFT on Hessian affine interest points

Keyword search results on other

categories

Page 12: Sudheendra Vijayanarasimhan Kristen Grauman Dept. of Computer Sciences

Example bags (Spam category)Engine Language Bag

Google English

Google French

Google German

Yahoo English

MSN English

MSN French

MSN German

Page 13: Sudheendra Vijayanarasimhan Kristen Grauman Dept. of Computer Sciences

Test phase-1.01

-1.03

-.991

-1.01

-.97

-.95

-.99

-.98

Page 14: Sudheendra Vijayanarasimhan Kristen Grauman Dept. of Computer Sciences

Official results (What worked)

(sMIL Spam Filter)

Page 15: Sudheendra Vijayanarasimhan Kristen Grauman Dept. of Computer Sciences

Official results (What did not)

saucepanCD “Retrospective” by Django

Reindhart Book “Paris to the Moon” by

Adam Gopnik

Remote control Digital camera

Note possible confusion between remote control keypad and fax machine keypad.

Page 16: Sudheendra Vijayanarasimhan Kristen Grauman Dept. of Computer Sciences

Unofficial results

The contest allowed 30 minutes to detect the objects, but our program took 37 minutes to finish. Once the program completed, these were the remainder of the results…

Page 17: Sudheendra Vijayanarasimhan Kristen Grauman Dept. of Computer Sciences

Unofficial results

Page 18: Sudheendra Vijayanarasimhan Kristen Grauman Dept. of Computer Sciences

Unofficial results

Page 19: Sudheendra Vijayanarasimhan Kristen Grauman Dept. of Computer Sciences

Unofficial results

Page 20: Sudheendra Vijayanarasimhan Kristen Grauman Dept. of Computer Sciences

Practice round resultsUpright Vacuum cleaner Brown pen Nescafe Taster’s Choice

Pellegrino bottle Pringles Red sport bottle

Results on a thirty minute preliminary trial run on the previous night by the organizers

Page 21: Sudheendra Vijayanarasimhan Kristen Grauman Dept. of Computer Sciences

Qualification round results

• Qualification resultsElectric iron Upright vacuum cleaner Scientific calculator

Harry potter and the deathly hallows

Lindt Madagaskar Twix candy bar DVD “shrek” DVD “gladiator”

Red bell pepperRitter sport marzipan

Tide detergent

Page 22: Sudheendra Vijayanarasimhan Kristen Grauman Dept. of Computer Sciences

0102030405060708090

100 Fergus et al2003

Opelt et al2004

sparse MIL

Rec

og

nit

ion

acc

ura

cy (

%)

Fully supervised techniques

Caltech-4 benchmark data set

Results on benchmark datasets: supervised vs. unsupervised

Page 23: Sudheendra Vijayanarasimhan Kristen Grauman Dept. of Computer Sciences

Problems encountered

• Only a small set of windows could be sampled due to the time constraint

• Feature extraction was computationally expensive and took up more than half the allotted time

• Partial consistency of web images implies the model learned might not always correspond to the true object bounding box

Page 24: Sudheendra Vijayanarasimhan Kristen Grauman Dept. of Computer Sciences

Possible Extensions

• Region based segmentation

• Use saliency operator to identify interesting regions

• Capture geometric constraints explicitly

Page 25: Sudheendra Vijayanarasimhan Kristen Grauman Dept. of Computer Sciences

Summary

Main idea:• learn discriminative classifiers directly from keyword

search returns• novel iterative refinement technique to simultaneously

improve both classifier and representation

Advantages:• allows direct specification of categories of interest• flexible to choice of kernel and features • obtains consistent detections on a number of datasets

using simple features under same model settings

Page 26: Sudheendra Vijayanarasimhan Kristen Grauman Dept. of Computer Sciences
Page 27: Sudheendra Vijayanarasimhan Kristen Grauman Dept. of Computer Sciences

Extra slides

Page 28: Sudheendra Vijayanarasimhan Kristen Grauman Dept. of Computer Sciences

SRVC Training phase details

• 15 images per bag• 3 different search engines and languages• SIFT features on hessian affine interest

points • Cluster features into 1000 “visual words”• Each image represented as a bag of

“visual words”• RBF kernel with fixed kernel parameters

Page 29: Sudheendra Vijayanarasimhan Kristen Grauman Dept. of Computer Sciences

Keywords to visual categories

• Once we learn to discriminate classes from the noisy, ambiguous training examples:– Recognize novel instances of those classes– Re-rank the Web search image returns

according to expected image content relevance

Page 30: Sudheendra Vijayanarasimhan Kristen Grauman Dept. of Computer Sciences

Comparison with existing resultsAccuracy for re-ranking Google

images

Ave

rag

e p

reci

sio

n

at 1

5% r

ecal

l

Ave

rag

e p

reci

sio

n

at 1

00 im

age

reca

ll

Accuracy for re-ranking Animals images

Using image content alone, our approach provides accurate re-ranking results, and for some classes improves precision more than methods employing both text and image features.

Page 31: Sudheendra Vijayanarasimhan Kristen Grauman Dept. of Computer Sciences

Aren’t slack variables enough?

Sparsity: % of negatives in positive bags

0 20 40 60 80

Caltech-7 classes

MIL is better suited to the sparse, noisy training data than the SIL baseline, and degrades much more gradually when given fewer true positives.

Page 32: Sudheendra Vijayanarasimhan Kristen Grauman Dept. of Computer Sciences

Iterative refinement