Using High-Level Visual Information for Color Constancy Joost van de Weijer, Cordelia Schmid, Jakob Verbeek Lear Team, INRIA Grenoble, France http://lear.inrialpes.fr/ people/vandeweijer/ Bottom-Up Illuminant Estimation RESULTS Top-Down Goal • We aim to improve illuminant estimation by using high- level visual information. • The most probable illuminant of a scene is the illuminant which when used to color correct an image, yields the most probable semantic interpretation. Approach From Hansen et al. “Memory modulates color appearance”, nature neuroscience, 2006. biological motivation: Semantic Interpretability Images Pixel Classification Green-Grass hypothesis: the average reflectance of a semantic class in an image is equal to the average reflectance of the semantic class in the database. A set of bottom-up illuminant hypotheses is derived from low-level image statistics: p=1, n=0 : the Grey-World hypothesis p=¶, n=0 : the White-Patch hypothesis p=k, n=0 : the Shades-of-Grey hypothesis p=1, n=1 : the Grey-Edge hypothesis bottom-up illuminants: The best bottom-up hypothesis is selected based on the semantic likelihood of the illuminant color corrected image. We apply n={0,1,2} and p={2,12} in experiments. Image representation • dense extraction of 20x20 pixel patches on 10x10 pixel grid • each patch described by discretized features, the words . • texture: SIFT (750 visual words, k-means) • color: RGB clusters (1000 visual words, k-means) • position: patch location indicated by cell in a 8x8 grid Multi-modal version of PLSA • single topic (t) per patch, drawn from image-specific topic weights • texture, color and position independent given the patch topic Learning a PLSA model from image-level keywords • set mixing weight to zero for classes not among the keywords • allow any non-negative sum-to-one values for remaining mixing weights • start EM with uniform assignments of patches to classes modalities topics patches J.J Verbeek and B. Triggs, “Region Classification with Markov Field Aspect Models”,CVPR07. E=-13.5,C=95% E=-14.1, C=87% E=-14.5,C=61% 0.5 22.1 4.5 1.8 7.8 1.4 grass sky cow face air grass tree grass sky grass plane Data Set contains both indoor and outdoor scenes from a wide variety of locations. The experiments are performed on a subset of 600 images taken at equal spacings form the set, divided in 320 indoor images, of which 160 training and 160 test images, and 280 outdoor images of which 140 training and 140 test images. Topic-word distributions are learned unsupervised on the texture and position cue ( color is ignored in training). Topic-word distributions are learned supervised. Data Set training: labelled images of Microsoft Research Cambridge (MSRC) set, together with ten images collected from Google Image for each class. Test : four images per class, totaling 36 images. ( ) 1 1 ( ) p p N n i k = ∂ = ∂ ∑ fx c x Classes: building, grass, tree, cow, sheep, sky, water, face and road. bike sky grass plane plane grass water m i w results in angular error: results pixel classification in %:

Using High-Level Visual Information for Color Constancylear.inrialpes.fr/pubs/2007/VSV07b/VSV07b.poster.pdfIlluminant Estimation RESULTS Top-Down Goal • We aim to improve illuminant

Download PDF Report

Upload
phungnga
View
216
Download
0

Embed Size (px)

Citation preview

Page 1: Using High-Level Visual Information for Color Constancylear.inrialpes.fr/pubs/2007/VSV07b/VSV07b.poster.pdfIlluminant Estimation RESULTS Top-Down Goal • We aim to improve illuminant

Using High-Level Visual Information for Color ConstancyJoost van de Weijer, Cordelia Schmid, Jakob VerbeekLear Team, INRIA Grenoble, France

http://lear.inrialpes.fr/ people/vandeweijer/

Bottom-Up

Illuminant Estimation

RESULTS

Top-Down

Goal • We aim to improve illuminant estimation by using high-level visual information.

• The most probable illuminant of a scene is the illuminant which when used to color correct an image, yields the

most probable semantic interpretation.

Approach

From Hansen et al. “Memory modulates color appearance”, nature

neuroscience, 2006.

biological motivation:

Semantic Interpretability Images

Pixel Classification

Green-Grass hypothesis: the average reflectance of

a semantic class in an image is equal to the averagereflectance of the semantic class in the database.

A set of bottom-up illuminant hypotheses is derived from low-level image statistics:

p=1, n=0 : the Grey-World hypothesis

p=¶, n=0 : the White-Patch hypothesis

p=k, n=0 : the Shades-of-Grey hypothesis

p=1, n=1 : the Grey-Edge hypothesis

bottom-up illuminants:

The best bottom-up hypothesis is selected based on the semantic likelihood of the illuminant color corrected

image. We apply n={0,1,2} and p={2,12} in experiments.

Image representation• dense extraction of 20x20 pixel patches on 10x10 pixel grid

• each patch described by discretized features, the words .

• texture: SIFT (750 visual words, k-means)• color: RGB clusters (1000 visual words, k-means)

• position: patch location indicated by cell in a 8x8 grid

Multi-modal version of PLSA• single topic (t) per patch, drawn from image-specific topic weights• texture, color and position independent given the patch topic

Learning a PLSA model from image-level keywords

• set mixing weight to zero for classes not among the keywords• allow any non-negative sum-to-one values for remaining mixing

weights• start EM with uniform assignments of patches to classes

modalitiestopicspatches

J.J Verbeek and B. Triggs, “Region Classification with

Markov Field Aspect Models”,CVPR07.

E=-13.5,C=95%

E=-14.1, C=87%

E=-14.5,C=61%

0.522.1 4.5

1.8 7.8 1.4

grasssky

cow

face

air

grass

tree

grass

sky

grass

plane

Data Set contains both indoor and outdoor scenes from a wide variety of locations. The experiments are performed on a subset of 600 images taken at equal spacings form the set, divided in 320 indoor images, of which 160 training and 160 test images, and 280 outdoor images of which 140 training and 140 test images.

Topic-word distributions are learned unsupervised on the texture and position cue ( color is ignored in training).

Topic-word distributions are learned supervised.

Data Set training: labelled images of Microsoft Research Cambridge(MSRC) set, together with ten images collected from Google Image for each class. Test : four images per class, totaling 36 images.

( )1

( )

p pN

∂=

∂∑

f xc

Classes: building, grass, tree, cow, sheep, sky, water, face and road.

bike

sky

grass

plane

grass