Putting Context into Vision - University of Illinois

Put

ting

Con

text

into

V

isio

n

Der

ek H

oiem

Sep

tem

ber 1

5, 2

004

Que

stio

ns to

Ans

wer

Wha

t is

cont

ext?

How

is c

onte

xt u

sed

in h

uman

vis

ion?

How

is c

onte

xt c

urre

ntly

use

d in

com

pute

r vi

sion

?

Con

clus

ions

Wha

t is

cont

ext?

Any

dat

a or

met

a-da

ta n

ot d

irect

ly

prod

uced

by

the

pres

ence

of a

n ob

ject

Nea

rby

imag

e da

taContext

Wha

t is

cont

ext?

Any

dat

a or

met

a-da

ta n

ot d

irect

ly

prod

uced

by

the

pres

ence

of a

n ob

ject

Nea

rby

imag

e da

taS

cene

info

rmat

ion

Context

Context

Wha

t is

cont

ext?

Any

dat

a or

met

a-da

ta n

ot d

irect

ly

prod

uced

by

the

pres

ence

of a

n ob

ject

Nea

rby

imag

e da

taS

cene

info

rmat

ion

Pre

senc

e, lo

catio

ns o

f oth

er o

bjec

tsTree

How

do

we

use

cont

ext?

Atte

ntio

n

Are

ther

e an

y liv

e fis

h in

this

pic

ture

?

Clu

es fo

r Fun

ctio

n

Wha

t is

this

?

Clu

es fo

r Fun

ctio

n

Wha

t is

this

?

Now

can

you

tell?

Low

-Res

Sce

nes

Wha

t is

this

?

Low

-Res

Sce

nes

Wha

t is

this

?

Now

can

you

tell?

Mor

e Lo

w-R

es

Wha

t are

thes

e bl

obs?

Mor

e Lo

w-R

es

The

sam

e pi

xels

! (a

car)

Why

is c

onte

xt u

sefu

l?

Obj

ects

def

ined

at l

east

par

tially

by

func

tion •Tr

ees

grow

in g

roun

d •

Bird

s ca

n fly

(usu

ally

)•

Doo

r kno

bs h

elp

open

doo

rs

Why

is c

onte

xt u

sefu

l?

Obj

ects

def

ined

at l

east

par

tially

by

func

tion

Con

text

giv

es c

lues

abo

ut fu

nctio

n•

Not

root

ed in

to th

e gr

ound

no

t tre

e•

Obj

ect i

n sk

y {c

loud

, bird

, UFO

, pla

ne,

supe

rman

} •

Doo

r kno

bs a

lway

s on

doo

rs

Why

is c

onte

xt u

sefu

l?

Obj

ects

def

ined

at l

east

par

tially

by

func

tion

Con

text

giv

es c

lues

abo

ut fu

nctio

nO

bjec

ts li

ke s

ome

scen

es b

ette

r tha

n ot

hers •To

ilets

like

bat

hroo

ms

•Fi

sh li

ke w

ater

Why

is c

onte

xt u

sefu

l?

Obj

ects

def

ined

at l

east

par

tially

by

func

tion

Con

text

giv

es c

lues

abo

ut fu

nctio

nO

bjec

ts li

ke s

ome

scen

es b

ette

r tha

n ot

hers

Man

y ob

ject

s ar

e us

ed to

geth

er a

nd,

thus

, ofte

n ap

pear

toge

ther

•K

ettle

and

sto

ve•

Key

boar

d an

d m

onito

r

How

is c

onte

xt u

sed

in

com

pute

r vis

ion?

Nei

ghbo

r-ba

sed

Con

text

Mar

kov

Ran

dom

Fie

ld (M

RF)

in

corp

orat

es c

onte

xtua

l con

stra

ints

site

s (o

r nod

es)

neig

hbor

s

Blo

bs a

nd W

ords

–C

arbo

netto

2004

Nei

ghbo

r-ba

sed

cont

ext (

MR

F) u

sefu

l eve

n w

hen

train

ing

data

is n

ot fu

lly s

uper

vise

d

Lear

ns m

odel

s of

obj

ects

giv

en c

aptio

ned

imag

es

…

Dis

crim

inat

ive

Ran

dom

Fi

elds

–K

umar

200

3U

sing

dat

a su

rrou

ndin

g th

e la

bel s

ite (n

ot ju

st a

t th

e la

bel s

ite) i

mpr

oves

resu

lts

Bui

ldin

gs v

s.

Non

-Bui

ldin

gs

Mul

ti-sc

ale

Con

ditio

nal R

ando

m

Fiel

d (m

CR

F) –

He

2004

Inde

pend

ent d

ata-

base

d la

bels

Raw

imag

e

Loca

l con

text

Sce

ne c

onte

xt

mC

RF

Fina

l dec

isio

n ba

sed

onC

lass

ifica

tion

(loca

l dat

a-ba

sed)

Loca

l lab

els

(wha

t rel

atio

n ne

arby

ob

ject

s ha

ve to

eac

h ot

her

Imag

e-w

ide

labe

ls (c

aptu

res

coar

se

scen

e co

ntex

t)

mC

RF

Res

ults

Nei

ghbo

r-ba

sed

Con

text

R

efer

ence

s

P. C

arbo

netto

, N. F

reita

san

d K

. Bar

nard

. “A

Sta

tistic

al

Mod

el fo

r Gen

eral

Con

text

ual O

bjec

t Rec

ogni

tion,

” E

CC

V, 2

004

S. K

umar

and

M. H

eber

t, “D

iscr

imin

ativ

e R

ando

m

Fiel

ds: A

Dis

crim

inat

ive

Fram

ewor

k fo

r Con

text

ual

Inte

ract

ion

in C

lass

ifica

tion,

” IC

CV

, 200

3

X. H

e, R

. Zem

elan

d M

. Car

reira

-Per

piñá

n, “M

ultis

cale

Con

ditio

nal R

ando

m F

ield

s fo

r Im

age

Labe

ling,

” C

VP

R, 2

004

Sce

ne-b

ased

Con

text

Ave

rage

pic

ture

s co

ntai

ning

hea

ds a

t thr

ee s

cale

s

Con

text

Prim

ing

–To

rral

ba20

01/2

003

Obj

ect

Pre

senc

e

Con

text

(gen

eral

ly ig

nore

d)

Loca

l evi

denc

e

(wha

t eve

ryon

e us

es)

Sca

leLo

catio

nP

ose

Imag

e M

easu

rem

ents

Pos

e an

d S

hape

Prim

ing

Focu

s of

A

ttent

ion

Sca

le

Sel

ectio

nO

bjec

t P

rimin

g

Get

ting

the

Gis

t of a

Sce

ne

Sim

ple

repr

esen

tatio

n

Spe

ctra

l cha

ract

eris

tics

(e.g

., G

abor

filte

rs) w

ith

coar

se d

escr

iptio

n of

spa

tial a

rran

gem

ent

PC

A re

duct

ion

Pro

babi

litie

s m

odel

ed w

ith m

ixtu

re o

f Gau

ssia

ns

(200

3) o

r log

istic

regr

essi

on (M

urph

y 20

03)

Con

text

Prim

ing

Res

ults

Obj

ect P

rese

nce

Focu

s of

Atte

ntio

n

Sca

le S

elec

tion

Sm

all

Larg

e

Usi

ng th

e Fo

rres

t to

See

the

Tree

s –

Mur

phy

(200

3)

Ada

boos

t Pat

ch-B

ased

O

bjec

t Det

ecto

rG

ist o

f Sce

ne(b

oost

ed re

gres

sion

)

Det

ecto

r Con

fiden

ce

at L

ocat

ion/

Sca

leE

xpec

ted

Loca

tion/

Sca

le

of O

bjec

t

Com

bine

(logi

stic

regr

essi

on)

Obj

ect P

roba

bilit

y at

Lo

catio

n/S

cale

Obj

ect D

etec

tion

+ S

cene

C

onte

xt R

esul

ts

Ofte

n do

esn’

t hel

p th

at m

uch

May

be

due

to p

oor u

se o

f con

text

Ass

umes

inde

pend

ence

of c

onte

xt a

nd lo

cal e

vide

nce

Onl

y us

es e

xpec

ted

loca

tion/

scal

e fro

m c

onte

xt

Key

boar

dS

cree

nP

eopl

e

Sce

ne-b

ased

Con

text

R

efer

ence

sE

. Ade

lson

, “O

n S

eein

g S

tuff:

The

Per

cept

ion

of M

ater

ials

by

Hum

ans

and

Mac

hine

s,” S

PIE

, 200

1

B. B

ose

and

E. G

rimso

n, “I

mpr

ovin

g O

bjec

t Cla

ssifi

catio

n in

Far

-Fie

ld

Vid

eo,”

EC

CV

, 200

4

K. M

urph

y, A

. Tor

ralb

aan

d W

. Fre

eman

, “U

sing

the

Forr

est t

o S

ee th

e Tr

ees:

A G

raph

ical

Mod

el R

elat

ing

Feat

ures

, Obj

ect,

and

Sce

nes,

”NIP

S,

2003

U. R

utis

haus

er, D

. Wal

ther

, C. K

och,

and

P. P

eron

a, “I

s bo

ttom

-up

atte

ntio

n us

eful

for o

bjec

t rec

ogni

tion?

,” C

VP

R, 2

004

A. T

orra

lba,

“Con

text

ual P

rimin

g fo

r Obj

ect D

etec

tion,

” IJC

V, 2

003

A. T

orra

lba

and

P. S

inha

, “S

tatis

tical

Con

text

Prim

ing

for O

bjec

t Det

ectio

n,”

ICC

V, 2

001

A. T

orra

lba,

K. M

urph

y, W

. Fre

eman

, and

M. R

ubin

, “C

onte

xt-B

ased

Vis

ion

Sys

tem

for P

lace

and

Obj

ect R

ecog

nitio

n,” I

CC

V, 2

003

Obj

ect-b

ased

Con

text

Mut

ual B

oost

ing

–Fi

nk

(200

3)

Loca

l Win

dow

Con

text

ual W

indo

w

eyes

face

sFi

lters

Obj

ect

Like

lihoo

ds+

+R

aw Im

age

Con

fiden

ce

Mut

ual B

oost

ing

Res

ults

Lear

ned

Feat

ures

Firs

t-Sta

ge C

lass

ifier

(MIT

+CM

U)

Con

text

ual M

odel

s us

ing

BR

Fs–

Torr

alba

2004

Tem

plat

e fe

atur

es

Bui

ld s

truct

ure

of C

RF

usin

g bo

ostin

g

Oth

er o

bjec

ts’ l

ocat

ions

’ lik

elih

oods

pr

opag

ate

thro

ugh

netw

ork

Labe

ling

a S

treet

Sce

ne

Labe

ling

an O

ffice

Sce

ne

F =

loca

l

G =

com

patib

ility

Obj

ect-b

ased

Con

text

R

efer

ence

s

M. F

ink

and

P. P

eron

a, “M

utua

l Boo

stin

g fo

r C

onte

xtua

l Inf

eren

ce,”

NIP

S, 2

003

A. T

orra

lba,

K. M

urph

y, a

nd W

. Fre

eman

, “C

onte

xtua

l M

odel

s fo

r Obj

ect D

etec

tion

usin

g B

oost

ed R

ando

m

Fiel

ds,”

AI M

emo

2004

-013

, 200

4

Wha

t els

e ca

n be

don

e?

Sce

ne S

truct

ure

Impr

ove

unde

rsta

ndin

g of

sce

ne

stru

ctur

eFl

oor,

wal

ls, c

eilin

gS

ky, g

roun

d, ro

ads,

bui

ldin

gs

Sem

antic

s vs

. Low

-leve

l

Low

-Lev

elS

eman

tics

Imag

e S

tatis

tics

Pre

senc

e of

Obj

ect

Hig

h-D

Sta

tistic

sU

sefu

l to

OD H

igh-

D

Imag

e S

tatis

tics

Sem

antic

s

Pre

senc

e of

Obj

ect

Hig

h-D

Low

-D

Hig

h-D

Sta

tistic

sU

sefu

l to

OD H

igh-

D

Put

ting

it al

l tog

ethe

r

Sce

ne G

ist

Con

text

Prim

ing

Bas

ic S

truct

ure

Iden

tific

atio

nN

eigh

bor-B

ased

C

onte

xt

Obj

ect

Det

ectio

n

Obj

ect-B

ased

C

onte

xt

Sce

neR

ecog

nitio

nS

cene

-Bas

ed

Con

text

Sum

mar

y

Nei

ghbo

r-ba

sed

cont

ext

Usi

ng n

earb

y la

bels

ess

entia

l for

“com

plet

e la

belin

g”

task

s

Usi

ng n

earb

y la

bels

use

ful e

ven

with

out c

ompl

etel

y su

perv

ised

trai

ning

dat

a

Usi

ng n

earb

y la

bels

and

near

by d

ata

is b

ette

r tha

n ju

st u

sing

nea

rby

labe

ls

Labe

ls c

an b

e us

ed to

ext

ract

loca

l and

sce

ne c

onte

xt

Sum

mar

y

Sce

ne-b

ased

con

text

“Gis

t” re

pres

enta

tion

suita

ble

for f

ocus

ing

atte

ntio

n or

de

term

inin

g lik

elih

ood

of o

bjec

t pre

senc

e

Sce

ne s

truct

ure

wou

ld p

rovi

de a

dditi

onal

use

ful

info

rmat

ion

(but

diff

icul

t to

extra

ct)

Sce

ne la

bel w

ould

pro

vide

add

ition

al u

sefu

l in

form

atio

n

Sum

mar

y

Obj

ect-b

ased

con

text

Eve

n si

mpl

e m

etho

ds o

f usi

ng o

ther

obj

ects

’ loc

atio

ns

impr

ove

resu

lts (F

ink)

Usi

ng B

RFs

, sys

tem

s ca

n au

tom

atic

ally

lear

n to

find

ea

sier

obj

ects

firs

t and

to u

se th

ose

obje

cts

as

cont

ext f

or o

ther

obj

ects

Con

clus

ions

Gen

eral

Few

obj

ect d

etec

tion

rese

arch

ers

use

cont

ext

Con

text

, whe

n us

ed e

ffect

ivel

y, c

an im

prov

e re

sults

dr

amat

ical

ly

A m

ore

inte

grat

ed a

ppro

ach

to u

se o

f con

text

and

da

ta c

ould

impr

ove

imag

e un

ders

tand

ing

Ref

eren

ces

E. A

dels

on, “

On

See

ing

Stu

ff: T

he P

erce

ptio

n of

Mat

eria

ls b

y H

uman

s an

d M

achi

nes,

” SP

IE,

2001

B. B

ose

and

E. G

rimso

n, “I

mpr

ovin

g O

bjec

t Cla

ssifi

catio

n in

Far

-Fie

ld V

ideo

,” E

CC

V, 2

004

P. C

arbo

netto

, N. F

reita

san

d K

. Bar

nard

. “A

Sta

tistic

al M

odel

for G

ener

al C

onte

xtua

l Obj

ect

Rec

ogni

tion,

” EC

CV

, 200

4M

. Fin

k an

d P

. Per

ona,

“Mut

ual B

oost

ing

for C

onte

xtua

l Inf

eren

ce,”

NIP

S, 2

003

X. H

e, R

. Zem

elan

d M

. Car

reira

-Per

piñá

n, “M

ultis

cale

Con

ditio

nal R

ando

m F

ield

s fo

r Im

age

Labe

ling,

” CV

PR

, 200

4 S

. Kum

ar a

nd M

. Heb

ert,

“Dis

crim

inat

ive

Ran

dom

Fie

lds:

A D

iscr

imin

ativ

e Fr

amew

ork

for

Con

text

ual I

nter

actio

n in

Cla

ssifi

catio

n,” I

CC

V, 2

003

J. L

affe

rty, A

. McC

allu

m a

nd F

. Per

eira

, “C

ondi

tiona

l ran

dom

fiel

ds: P

roba

bilis

tic m

odel

s fo

r se

gmen

ting

and

labe

ling

sequ

ence

dat

a,” I

CM

L, 2

001

K. M

urph

y, A

. Tor

ralb

aan

d W

. Fre

eman

, “U

sing

the

Forre

st to

See

the

Tree

s: A

Gra

phic

al

Mod

el R

elat

ing

Feat

ures

, Obj

ect,

and

Sce

nes,

” NIP

S, 2

003

U. R

utis

haus

er, D

. Wal

ther

, C. K

och,

and

P. P

eron

a, “I

s bo

ttom

-up

atte

ntio

n us

eful

for o

bjec

t re

cogn

ition

?,”

CV

PR

, 200

4A

. Tor

ralb

a, “C

onte

xtua

l Prim

ing

for O

bjec

t Det

ectio

n,” I

JCV

, 200

3A

. Tor

ralb

aan

d P

. Sin

ha, “

Stat

istic

al C

onte

xt P

rimin

g fo

r Obj

ect D

etec

tion,

” IC

CV

, 200

1A

. Tor

ralb

a, K

. Mur

phy,

and

W. F

reem

an, “

Con

text

ual M

odel

s fo

r Obj

ect D

etec

tion

usin

g B

oost

ed R

ando

m F

ield

s,” A

I Mem

o 20

04-0

13, 2

004

A. T

orra

lba,

K. M

urph

y, W

. Fre

eman

, and

M. R

ubin

, “C

onte

xt-B

ased

Vis

ion

Sys

tem

for P

lace

an

d O

bjec

t Rec

ogni

tion,

” IC

CV

, 200

3

Documents

Putting Context into Vision - University of Illinois