WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES

WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES

Prasad Gabbur, Kobus Barnard

University of Arizona

Overview

Word-prediction using translation model for object recognition

Feature evaluation

Segmentation evaluation

Modifications to Normalized Cuts segmentation algorithm

Evaluation of color constancy algorithms

Effects of illumination color change on object recognition

Strategies to deal with illumination color change

Low-level computer vision algorithms Segmentation, edge detection, feature extraction, etc.

Building blocks of computer vision systems

Is there a generic task to evaluate these algorithms quantitatively?

Word-prediction using translation model for object recognition Sufficiently general

Quantitative evaluation is possible

Motivation

Translation model for object recognition

Translate from visual to semantic description

Approach

Model joint probability distribution of visual representations and associated words using a large, annotated image collection.

Corel database

Image pre-processing

sun sky waves sea

visual features

Segmentation*

* Thanks to N-cuts team [Shi, Tal, Malik] for their segmentation algorithm

[f1 f2 f3 …. fN]

Joint distribution

P(w | b) P(w | l)P(b | l)P(l) P(b)l

word

blob

joint visual/textual concepts *

Learn P(w | l), P(b | l), and P(l) from data using EM

Node l

Frequency table

Gaussian over features

* Barnard et al JMLR 2003

Annotating images

Segment image

Compute P(w|b) for each region

Sum over regions

. . .

b1

b2

P(w|b1)

P(w|b2)

+

P(w|image)

CAT TIGER GRASS FOREST

Predicted Words

Actual Keywords

CAT HORSE GRASS WATER

Measuring performance

• Record percent correct• Use annotation performance as a proxy for recognition

• Large region-labeled databases are not available• Large annotated databases are available

75%Training

160 CD’s

80 CD’s

80 CD’sNovel

25%Test

Experimental protocol

sampling scheme Each CD contains 100 images on one specific topic like “aircraft”

Average results over 10 different samplings

Corel database

Semantic evaluation of vision processes

Feature setsCombinations of visual features

Segmentation methods Mean-Shift [Comaniciu, Meer]

Normalized Cuts [Shi, Tal, Malik]

Color constancy algorithms Train with illumination change

Color constancy processing – Gray-world, Scale-by-max

Feature evaluation

FeaturesSize

Location Shape

• Second moment

• Compactness

• Convexity

• Outer boundary descriptor

Color

(RGB, L*a*b, rgS)

• Average color

• Standard deviation

Texture

Responses to a bank of filters

• Even and Odd symmetric

• Rotationally symmetric (DOG)

Context

(Average surrounding color)

Feature evaluation

Base = Size + Location + Second moment + Compactness

0

0.02

0.04

0.06

0.08

0.1

0.12

Base +Color +Texture +Shape

TrainingHeld outNovel

An

nota

tion

P

erf

orm

an

ce

(big

ger

is b

ett

er)


Mean Shift

(Comaniciu, Meer)

Normalized Cuts (N-Cuts)

(Shi, Tal, Malik)


• Performance depends on number of regions used for annotation

• Mean Shift is better than N-Cuts for # regions < 6

An

nota

tion

P

erf

orm

an

ce

(big

ger

is b

ett

er)

# regions

Normalized Cuts

• Graph partitioning technique• Bi-partitions an edge-weighted graph in an optimal sense

• Normalized cut (Ncut) is the optimizing criterion

i j

wij

Edge weight => Similarity between i and j

A B

Minimize Ncut(A,B)

Nodes

• Image segmentation• Each pixel is a node

• Edge weight is similarity between pixels

• Similarity based on color, texture and contour cues

Normalized Cuts

Original algorithm

pixelpixel regionregion

Initialseg

Finalseg

Produces splits in homogeneous regions, e.g., “sky”

– Local connectivity between pixels

Preseg Seg

Meta-segmentation

regionregion

Preseg Iteration 1 Iteration n

regionregion

k lRi Rj

ijkl WT

W1ˆ

k lRi Rj

ijkl WW

Modifications to Normalized Cuts

Original

Modified

k

l

k

l

Modifications to Normalized Cuts

Original Modified Original Modified

Original vs. Modified

• For # regions < 6, modified out-performs original

• For # regions > 6, original is better

An

nota

tion

P

erf

orm

an

ce

(big

ger

is b

ett

er)

# regions

Incorporating high-level information into segmentation

algorithms

Low-level segmenters split up objects (eg. Black and white halves of a penguin)

Using word-prediction gives a way to incorporate high-level semantic information into segmentation algorithms

Propose a merge between regions that have similar posterior distributions over words

Illumination change

Makes recognition difficult

Illumination color change

Illuminant 1

Illuminant 2

Strategies to deal with illumination change:

• Train for illumination change

• Color constancy pre-processing and normalizationhttp://www.cs.sfu.ca/~colour/data

*

*

Training

Train for illumination change

Variation of color under expected illumination changes

[Matas et al 1994, Matas 1996, Matas et al 2000]

Algorithm

Unknown illuminant Canonical (reference) illuminant

(Map image as if it were taken under reference illuminant).

Test Input

Recognition system

Training database

Canonical (reference) illuminant

Color constancy pre-processing

[Funt et al 1998]

Algorithm

Unknown illuminant Canonical (reference) illuminant

(Map image as if it were taken under reference illuminant).

Test Input

Recognition system

Normalized training database

Canonical (reference) illuminant

Training database

Algorithm

Color normalization

[Funt and Finlayson 1995, Finlayson et al 1998]

Unknown illuminant

Simulating illumination change

11 illuminants

(0 is canonical)

0 1 2

3 4 5

6 7 8

9 10

Train with illumination variation

Experiment BTraining: No illumination change

Testing: Illumination change

Experiment CTraining: Illumination change


An

nota

tion

P

erf

orm

an

ce

(big

ger

is b

ett

er)

Experiment ATraining: No illumination change

Testing: No illumination change

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

A B C

TrainingHeld-outNovel


Gray-world

Training Test

Algorithm

Mean color = constant

Canonical Unknown

Canonical

rr

rr ct g

g

gg ct b

b

bb ct

r g b

tr tg tb

Color constancy pre-processingScale-by-max

Training Test

Algorithm

Max color = constant

Canonical Unknown

Canonical

r g b

tr tg tb

rr

rr

m

mc

t gg

gg

m

mc

t bb

bb

m

mc

t




OthersTraining: No illumination change


+ Color constancy algorithm

An

nota

tion

P

erf

orm

an

ce

(big

ger

is b

ett

er)

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

A B Gray-world

Scale-by-max




Color normalization

Gray-world

Scale-by-max

Training Test Training Test

Algorithm

Algorithm

Mean color = constant

Max color = constant

Canonical

Unknown

Canonical

Unknown

Color normalization



OthersTraining: No illumination change




An

nota

tion

P

erf

orm

an

ce

(big

ger

is b

ett

er)



0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

A B Gray-world

Scale-by-max


Conclusions

Translation (visual to semantic) model for object recognition

Identify and evaluate low-level vision processes for recognition

Feature evaluation

Color and texture are the most important in that order

Shape needs better segmentation methods


Performance depends on # regions for annotation

Mean Shift and modified NCuts do better than original NCuts for # regions < 6

Color constancy evaluation

Training with illumination helps

Color constancy processing helps (scale-by-max better than gray-world)

Thank you!

Documents

WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES