31
Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech

Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech

Embed Size (px)

Citation preview

Page 1: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech

Enhancing Human-Machine Communication via Visual Attributes

Devi ParikhVirginia Tech

Page 2: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech

Interacting with Vision Systems

User Supervisor

2

Page 3: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech

Interacting with Vision Systems

Semantic Gap3

Mode of communication is important

Page 4: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech

Interacting with Vision Systems

• Necessary for communication– Language that humans understand (semantic)– Language that machines understand (visual)

• Attributes– Example: furry, natural, chubby, shiny, etc.– Better features, deeper image understanding, etc.

Farhadi et al., Kumar et al., Lampert et al., etc.– Human-machine communication

4

Page 5: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech

SupervisorUser

User

Reading Between the

Lines

Supervisor

Role of the Human

Com

mun

icat

or

SupervisorUser

Hum

anM

achi

neImage Search Instilling Domain Knowledge

Characterizing Failure Modes

Interpretable Models

My missing brother is fuller-faced than

this boy.

Polar bears are white and larger

than rabbits.

If the image is blurry or the face is not frontal, I may fail.

I think this is a polar bear because this is a

white and furry animal.

Active and Interactive Learning

5

Page 6: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech

SupervisorUser

User

Reading Between the

Lines

Supervisor

Role of the Human

Com

mun

icat

or

SupervisorUser

Hum

anM

achi

neImage Search Instilling Domain Knowledge

Characterizing Failure Modes

Interpretable Models

My missing brother is fuller-faced than

this boy.

Polar bears are white and larger

than rabbits.

If the image is blurry or the face is not frontal, I may fail.

I think this is a polar bear because this is a

white and furry animal.

Active and Interactive Learning

6

Page 7: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech

Image SearchQuery: “black shoes”

7

Binary Relevance Feedback

Page 8: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech

Image SearchQuery: “black shoes”

“shinier than these”

“more formal than these”

8

Page 9: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech

Relative Attributes

Openness

9

Linear ranking function: open

Training

Testing

[Parikh and Grauman, ICCV 2011]

Page 10: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech

Image Search

• System has pre-trained relative attribute predictors

• Relevance of image = # constraints satisfied

10

“shinier”“more formal”

Page 11: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech

WhittleSearchshiny

formal

“shinier”“more formal” 11

Page 12: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech

WhittleSearchshiny

formal

12

Page 13: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech

WhittleSearch

13

[Kovashka, Parikh and Grauman, CVPR 2012](Patent pending)

13

Page 14: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech

Whittle Search: Demo (Online)

14[Prepared by Naman Agrawal, Demo at CVPR 2013]

(Patent pending) 14

Page 15: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech

SupervisorUser

User

Reading Between the

Lines

Supervisor

Role of the Human

Com

mun

icat

or

SupervisorUser

Hum

anM

achi

neImage Search Instilling Domain Knowledge

Characterizing Failure Modes

Interpretable Models

My missing brother is fuller-faced than

this boy.

Polar bears are white and larger

than rabbits.

If the image is blurry or the face is not frontal, I may fail.

I think this is a polar bear because this is a

white and furry animal.

Active and Interactive Learning

15

Page 16: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech

SupervisorUser

User

Reading Between the

Lines

Supervisor

Role of the Human

Com

mun

icat

or

SupervisorUser

Hum

anM

achi

neImage Search Instilling Domain Knowledge

Characterizing Failure Modes

Interpretable Models

My missing brother is fuller-faced than

this boy.

Polar bears are white and larger

than rabbits.

If the image is blurry or the face is not frontal, I may fail.

I think this is a polar bear because this is a

white and furry animal.

Active and Interactive Learning

16

Page 17: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech

SupervisorUser

User

Reading Between the

Lines

Supervisor

Role of the Human

Com

mun

icat

or

SupervisorUser

Hum

anM

achi

neImage Search Instilling Domain Knowledge

Characterizing Failure Modes

Interpretable Models

My missing brother is fuller-faced than

this boy.

Polar bears are white and larger

than rabbits.

If the image is blurry or the face is not frontal, I may fail.

I think this is a polar bear because this is a

white and furry animal.

Active and Interactive Learning

17

Page 18: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech

Traditional Active Learning

Is this a forest? No, this is not a forest.

18

Page 19: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech

[Parkash and Parikh, ECCV 2012]

Classifier FeedbackI think this is a

forest. What do you think ?

No, this is too open to be a

forest.

Ah! These images must

not be forests either then.

19

[Images more open than query]

Page 20: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech

Classifier FeedbackI think this is a

forest. What do you think ?

No, this is too open to be a

forest.

Ah! These images must

not be forests either then.

20

[Images more open than query]

Pre-trained relative

attributes

Page 21: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech

Classifier FeedbackI think this is a

forest. What do you think ?

No, this is too open to be a

forest.

Ah! These images must

not be forests either then.

21

[Images more open than query]

Learn attributes on

the fly

Page 22: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech

Classifier FeedbackI think this is a

forest. What do you think ?

No, this is too open to be a

forest.

Ah! These images must be less open than query

22

[images labeled as forest]

Page 23: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech

[Biswas and Parikh, CVPR 2013]

Classifier Feedback

• Learning attributes on the fly– Start only with unlabeled images (+ a supervisor)– Categories and attributes learnt from scratch

• Confidence in instances

• Active learning for learning with attributes-based classifier feedback

23

Page 24: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech

Classifier Feedback

0 50 100 150 200 250 30020

30

40

50

60

70

No attributes-based feedback

Parkash et al. ECCV 2012

Proposed

Number of iterations

Accu

racy

24

Parkash and Parikh ECCV 2012

Biswas and Parikh CVPR 2013

Page 25: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech

SupervisorUser

User

Reading Between the

Lines

Supervisor

Role of the Human

Com

mun

icat

or

SupervisorUser

Hum

anM

achi

neImage Search Instilling Domain Knowledge

Characterizing Failure Modes

Interpretable Models

My missing brother is fuller-faced than

this boy.

Polar bears are white and larger

than rabbits.

If the image is blurry or the face is not frontal, I may fail.

I think this is a polar bear because this is a

white and furry animal.

Active and Interactive Learning

25

Page 26: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech

WhittleSearchQuery: “black shoes”

“shinier than these”

“more formal than these”

26

Page 27: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech

Image Search

27[Parikh and Grauman, ICCV 2013]

Page 28: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech

28

Saying the Right Thing

Smiling more thanNot smiling

[Sadovnik, Gallagher, Parikh and Chen, ICCV 2013]

• Improved image search, description

Page 29: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech

29

Saliency of Attributes• Improved image search, zero-shot learning,

description

White, furry Scary, sharp teeth

[Turakhia and Parikh, ICCV 2013]

Page 30: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech

SupervisorUser

User

Reading Between the

Lines

Supervisor

Role of the Human

Com

mun

icat

or

SupervisorUser

Hum

anM

achi

neImage Search Instilling Domain Knowledge

Characterizing Failure Modes

Interpretable Models

My missing brother is fuller-faced than

this boy.

Polar bears are white and larger

than rabbits.

If the image is blurry or the face is not frontal, I may fail.

I think this is a polar bear because this is a

white and furry animal.

Active and Interactive Learning

30

Accessing user’s intensions for mental

image search

More usable computer vision

systems even with their imperfections

Trustworthy systems: key for effective human-

machine teams

Integrating AI with today’s machine

learning tools

Getting more from what the

human says without added human effort

Enhanced human-machine communication via attributes for improved visual

recognition

Page 31: Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech

Thank you!