50
Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010, Crete, Greece 1

Visual Recognition With Humans in the Loop

  • Upload
    clem

  • View
    39

  • Download
    0

Embed Size (px)

DESCRIPTION

Visual Recognition With Humans in the Loop. ECCV 2010, Crete, Greece. Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie. Peter Welinder Pietro Perona. What type of bird is this?. Field Guide. …?. What type of bird is this?. Computer Vision. ?. - PowerPoint PPT Presentation

Citation preview

Page 1: Visual Recognition With Humans in the Loop

1

Visual Recognition With Humans in the Loop

Steve Branson

Catherine Wah

Florian Schroff

Boris Babenko

Serge Belongie

Peter WelinderPietro Perona

ECCV 2010, Crete, Greece

Page 2: Visual Recognition With Humans in the Loop

2

What type of bird is this?

Page 3: Visual Recognition With Humans in the Loop

What type of bird is this?

3…?

Field Guide

Page 4: Visual Recognition With Humans in the Loop

4

What type of bird is this?

Computer Vision

?

Page 5: Visual Recognition With Humans in the Loop

5

What type of bird is this?

Bird?

Computer Vision

Page 6: Visual Recognition With Humans in the Loop

6

What type of bird is this?

Chair?Bottle?

Computer Vision

Page 7: Visual Recognition With Humans in the Loop

7

Parakeet Auklet

• Field guides difficult for average users

• Computer vision doesn’t work perfectly (yet)

• Research mostly on basic-level categories

Page 8: Visual Recognition With Humans in the Loop

8

Visual Recognition With Humans in the Loop

Parakeet Auklet

What kind of bird is this?

Page 9: Visual Recognition With Humans in the Loop

9

Levels of Categorization

Airplane? Chair? Bottle? …

Basic-Level Categories

[Griffin et al. ‘07, Lazebnik et al. ‘06, Grauman et al. ‘06, Everingham et al. ‘06, Felzenzwalb et al. ‘08, Viola et al. ‘01, … ]

Page 10: Visual Recognition With Humans in the Loop

10

Levels of Categorization

American Goldfinch? Indigo Bunting? …

Subordinate Categories

[Belhumeur et al. ‘08 , Nilsback et al. ’08, …]

Page 11: Visual Recognition With Humans in the Loop

11

Levels of Categorization

Yellow Belly? Blue Belly?…

Parts and Attributes

[Farhadi et al. ‘09, Lampert et al. ’09, Kumar et al. ‘09]

Page 12: Visual Recognition With Humans in the Loop

12

Visual 20 Questions GameBlue Belly? no

Cone-shaped Beak? yes

Striped Wing? yes

American Goldfinch? yes

Hard classification problems can be turned into a sequence of easy ones

Page 13: Visual Recognition With Humans in the Loop

13

Recognition With Humans in the Loop

Computer Vision

Cone-shaped Beak? yes

American Goldfinch? yes

Computer Vision

• Computers: reduce number of required questions• Humans: drive up accuracy of vision algorithms

Page 14: Visual Recognition With Humans in the Loop

14

Research Agenda

Heavy Reliance on Human Assistance

More Automated

Computer Vision

Improves

Blue belly? noCone-shaped beak? yesStriped Wing? yesAmerican Goldfinch? yes

Striped Wing? yesAmerican Goldfinch? yes

Fully AutomaticAmerican Goldfinch? yes

2010 2015

2025

Page 15: Visual Recognition With Humans in the Loop

15

Field Guides

www.whatbird.com

Page 16: Visual Recognition With Humans in the Loop

16

Field Guides

www.whatbird.com

Page 17: Visual Recognition With Humans in the Loop

17

Example Questions

Page 18: Visual Recognition With Humans in the Loop

18

Example Questions

Page 19: Visual Recognition With Humans in the Loop

19

Example Questions

Page 20: Visual Recognition With Humans in the Loop

20

Example Questions

Page 21: Visual Recognition With Humans in the Loop

21

Example Questions

Page 22: Visual Recognition With Humans in the Loop

22

Example Questions

Page 23: Visual Recognition With Humans in the Loop

23

Basic Algorithm

Input Image ( )xQuestion 1:

Is the belly black?

Question 2:Is the bill hooked?

Computer Vision

A: NO

A: YES

)|( xcp

),|( 1uxcp

),,|( 21 uuxcp

1u

2u

Max Expected Information Gain

Max Expected Information Gain

Page 24: Visual Recognition With Humans in the Loop

24

Without Computer Vision

Input Image ( )xQuestion 1:

Is the belly black?

Question 2:Is the bill hooked?

Class Prior

A: NO

A: YES

)(cp

)|( 1ucp

),|( 21 uucp

1u

2u

Max Expected Information Gain

Max Expected Information Gain

Page 25: Visual Recognition With Humans in the Loop

25

Basic Algorithm

Select the next question that maximizes expected information gain:• Easy to compute if we can to estimate probabilities of

the form:)...,,|( 21 tuuuxcp

Object Class

Image Sequence of user responses

Page 26: Visual Recognition With Humans in the Loop

26

Basic Algorithm

Zxcpcuuup t )|()|...,( 21

Model of user responses

Computer vision

estimate

)...,,|( 21 tuuuxcp

Normalization factor

Page 27: Visual Recognition With Humans in the Loop

27

Basic Algorithm

Zxcpcuuup t )|()|...,( 21

Model of user responses

Computer vision

estimate

)...,,|( 21 tuuuxcp

Normalization factor

Page 28: Visual Recognition With Humans in the Loop

28

Modeling User Responses

ti it cupcuuup...121 )|()|...,(• Assume:

• Estimate using Mechanical Turk)|( cup i

grey red

blac k

whi

tebr

own

blue

grey red

blac k

whi

tebr

own

blue

grey red

blac k

whi

tebr

own

blue

Definitely

Probably Guessing

What is the color of the belly?

Pine Grosbeak

Page 29: Visual Recognition With Humans in the Loop

29

Incorporating Computer Vision

• Use any recognition algorithm that can estimate: p(c|x)

• We experimented with two simple methods:

)}(exp{)|( xmxcp 1-vs-all SVM

i i capxcp )|()|(

Attribute-based classification

[Lampert et al. ’09, Farhadi et al. ‘09]

Page 30: Visual Recognition With Humans in the Loop

30

Incorporating Computer Vision

[Vedaldi et al. ’08, Vedaldi et al. ’09]

Self SimilarityColor Histograms Color Layout

Bag of WordsSpatial Pyramid

Geometric Blur Color SIFT, SIFT

Multiple Kernels

•Used VLFeat and MKL code + color features

Page 31: Visual Recognition With Humans in the Loop

31

Birds 200 Dataset•200 classes, 6000+ images, 288 binary attributes

•Why birds?

Black-footed Albatross

Groove-Billed Ani

Parakeet Auklet Field Sparrow Vesper Sparrow

Arctic Tern Forster’s Tern Common Tern Baird’s Sparrow Henslow’s Sparrow

Page 32: Visual Recognition With Humans in the Loop

32

Birds 200 Dataset•200 classes, 6000+ images, 288 binary attributes

•Why birds?

Black-footed Albatross

Groove-Billed Ani

Parakeet Auklet Field Sparrow Vesper Sparrow

Arctic Tern Forster’s Tern Common Tern Baird’s Sparrow Henslow’s Sparrow

Page 33: Visual Recognition With Humans in the Loop

33

Birds 200 Dataset•200 classes, 6000+ images, 288 binary attributes

•Why birds?

Black-footed Albatross

Groove-Billed Ani

Parakeet Auklet Field Sparrow Vesper Sparrow

Arctic Tern Forster’s Tern Common Tern Baird’s Sparrow Henslow’s Sparrow

Page 34: Visual Recognition With Humans in the Loop

34

Results: Without Computer Vision

Comparing Different User Models

Page 35: Visual Recognition With Humans in the Loop

35

Results: Without Computer Vision

if users answers agree with field guides…Perfect Users: 100% accuracy in 8≈log2(200) questions

Page 36: Visual Recognition With Humans in the Loop

36

Results: Without Computer VisionReal users answer questions

MTurkers don’t always agree with field guides…

Page 37: Visual Recognition With Humans in the Loop

37

Results: Without Computer VisionReal users answer questions

MTurkers don’t always agree with field guides…

Page 38: Visual Recognition With Humans in the Loop

38

Results: Without Computer VisionProbabilistic User Model: tolerate imperfect user responses

Page 39: Visual Recognition With Humans in the Loop

39

Results: With Computer Vision

Page 40: Visual Recognition With Humans in the Loop

40

Results: With Computer Vision

Users drive performance: 19% 68%

Just Computer Vision19%

Page 41: Visual Recognition With Humans in the Loop

41

Results: With Computer Vision

Computer Vision Reduces Manual Labor: 11.1 6.5 questions

Page 42: Visual Recognition With Humans in the Loop

42

Examples

Without computer vision: Q #1: Is the shape perching-like? no (Def.)With computer vision: Q #1: Is the throat white? yes (Def.)

Western Grebe

Different Questions Asked w/ and w/out Computer Vision

perching-like

Page 43: Visual Recognition With Humans in the Loop

43

Examples

computer vision

Magnolia Warbler

User Input Helps Correct Computer Vision

Is the breast pattern solid? no (definitely)

Common Yellowthroat

Magnolia Warbler

Common Yellowthroat

Page 44: Visual Recognition With Humans in the Loop

44

Recognition is Not Always SuccessfulAcadian Flycatcher

Least Flycatcher

Parakeet Auklet

Least Auklet

Is the belly multi-colored? yes (Def.)

Unlimited questions

Page 45: Visual Recognition With Humans in the Loop

Summary

11.1 6.5 questions

Computer vision reduces manual labor

Users drive up performance

19%

45

Recognition of fine-grained categories

More reliable than field guides

Page 46: Visual Recognition With Humans in the Loop

Summary

11.1 6.5 questions

Computer vision reduces manual labor

Users drive up performance

19%

46

Recognition of fine-grained categories

More reliable than field guides

Page 47: Visual Recognition With Humans in the Loop

Summary

11.1 6.5 questions

Computer vision reduces manual labor

Users drive up performance

19%

47

Recognition of fine-grained categories

More reliable than field guides

Page 48: Visual Recognition With Humans in the Loop

Summary

11.1 6.5 questions

Computer vision reduces manual labor

Users drive up performance

19%

48

Recognition of fine-grained categories

More reliable than field guides

Page 49: Visual Recognition With Humans in the Loop

49

Future Work• Extend to domains other than birds• Methodologies for generating questions• Improve computer vision

Page 50: Visual Recognition With Humans in the Loop

50

Questions?

Project page and datasets available at:http://vision.caltech.edu/visipedia/

http://vision.ucsd.edu/project/visipedia/