53
Novel Approaches to Natural Scene Categorization Amit Prabhudesai Roll No. 04307002 [email protected] M.Tech Thesis Defence Under the guidance of Prof. Subhasis Chaudhuri Indian Institute of Technology,Bombay Natural Scene Categorization – p.1/32

Categorization of natural images

Embed Size (px)

DESCRIPTION

A slide-deck of my graduate thesis presentation at IIT-Bombay (2006).

Citation preview

Page 1: Categorization of natural images

Novel Approaches to Natural Scene Categorization

Amit Prabhudesai

Roll No. 04307002

[email protected]

M.Tech Thesis Defence

Under the guidance of

Prof. Subhasis Chaudhuri

Indian Institute of Technology, Bombay

Natural Scene Categorization – p.1/32

Page 2: Categorization of natural images

Overview of topics to be covered

• Natural Scene Categorization: Challenges• Our contribution

◦ Qualitative visual environment description• Portable, real-time system to aid the visually impaired• System has peripheral vision!

◦ Model-based approaches• Use of stochastic models to capture semantics• pLSA and maximum entropy models

• Conclusions and Future Work

Natural Scene Categorization – p.2/32

Page 3: Categorization of natural images

Natural Scene Categorization

• Interesting application of a CBIR system• Images from a broad image domain: diverse and often

ambiguous• Bridging the semantic gap• Grouping scenes into semantically meaningful categories

could aid further retrieval• Efficient schemes for grouping images into semantic

categories

Natural Scene Categorization – p.3/32

Page 4: Categorization of natural images

Qualitative Visual Environment Retrieval

SKYBUILDING

WO

OD

S

LAWN

RB

RTFR

LT

LB

WATER BODY

P1

P2 P3

• Use of omnidirectional images• Challenges

◦ Unstructured environment◦ No prior learning (unlike navigation/localization)

• Target application and objective◦ Wearable computing community, emphasis on visually

challenged people◦ Real-time operation

Natural Scene Categorization – p.4/32

Page 5: Categorization of natural images

Qualitative Visual Environment System: Overview

• Environment representation• Environment retrieval

◦ View partitioning◦ Feature extraction◦ Node annotation◦ Dynamic node annotation◦ Real-time operation

• Results

Natural Scene Categorization – p.5/32

Page 6: Categorization of natural images

System Overview (contd.)

• Environment representation◦ Image database containing images belonging to 6

classes: Lawns(L), Woods(W), Buildings(B),Waterbodies(H), Roads(R) and Traffic(T)

◦ Moderately large intra-class variance (in the featurespace) in images of each category

◦ Description relative to the person using the system: e.g.,‘to left of’, ‘in the front’, etc.

◦ Topological relationships indicated by a graph◦ Each node annotated by an identifier associated with a

class

Natural Scene Categorization – p.6/32

Page 7: Categorization of natural images

System Overview (contd.)

• Environment Retrieval◦ View Partitioning

RT

RBLB

LT

XX

FR

BS

� � � � � �

� � � � � �

� � � � � �

� � � � � �

� � � � � �

� � � � � �

� � � � �

� � � � �

� � � � �

� � � � �

� � � � �� � � � � �

� � � � � �

� � � � � �

� � � � � �

� � � � � �

� � � � � �

� � � � � �

� � � � � �

� � � � � �

� � � � � �

� � � � � �

� � � � � �

FORWARD DIRECTION

BACKWARD DIRECTION

RT

FR

LT

LB

XX

RB

BS

View Partitioning Graphical representation

◦ Feature Extraction• Feature invariant to scaling, viewpoint, illumination

changes, and geometric warping introduced byomnicam images

• Colour histogram selected as the feature forperforming CBIR

Natural Scene Categorization – p.7/32

Page 8: Categorization of natural images

System Overview (contd.)

• Environment Retrieval◦ View Partitioning

RT

RBLB

LT

XX

FR

BS

� � � � � �

� � � � � �

� � � � � �

� � � � � �

� � � � � �

� � � � � �

� � � � �

� � � � �

� � � � �

� � � � �

� � � � �� � � � � �

� � � � � �

� � � � � �

� � � � � �

� � � � � �

� � � � � �

� � � � � �

� � � � � �

� � � � � �

� � � � � �

� � � � � �

� � � � � �

FORWARD DIRECTION

BACKWARD DIRECTION

RT

FR

LT

LB

XX

RB

BS

View Partitioning Graphical representation

◦ Feature Extraction• Feature invariant to scaling, viewpoint, illumination

changes, and geometric warping introduced byomnicam images

• Colour histogram selected as the feature forperforming CBIR

Natural Scene Categorization – p.7/32

Page 9: Categorization of natural images

System Overview (contd.)

• Environment Retrieval◦ Node annotation

• Objective: Robust retrieval against illuminationchanges and intra-class variations

• Solution: Annotation decided by a simple votingscheme

◦ Dynamic node annotation• Temporal evolution of graph Gn with time tn• Complete temporal evolution of the graph given by G,

obtained by concatenating the subgraphs Gn,i.e.,G = {G1, G2, . . . , Gk, . . .}

Natural Scene Categorization – p.8/32

Page 10: Categorization of natural images

System Overview (contd.)

• Environment Retrieval◦ Node annotation

• Objective: Robust retrieval against illuminationchanges and intra-class variations

• Solution: Annotation decided by a simple votingscheme

◦ Dynamic node annotation• Temporal evolution of graph Gn with time tn• Complete temporal evolution of the graph given by G,

obtained by concatenating the subgraphs Gn,i.e.,G = {G1, G2, . . . , Gk, . . .}

Natural Scene Categorization – p.8/32

Page 11: Categorization of natural images

System Overview (contd.)

• Environment Retrieval◦ Real-time operation

• Colour histogram: compact feature vector• Pre-computed histograms of all the database images• Linear time complexity (O(N)): on P-IV 2.0 GHz, ∼

100 ms for single omnicam image

◦ Portable, low-cost system for visually impaired• Modest hardware and software requirements• Easily put together using off-the-shelf components

Natural Scene Categorization – p.9/32

Page 12: Categorization of natural images

System Overview (contd.)

• Environment Retrieval◦ Real-time operation

• Colour histogram: compact feature vector• Pre-computed histograms of all the database images• Linear time complexity (O(N)): on P-IV 2.0 GHz, ∼

100 ms for single omnicam image

◦ Portable, low-cost system for visually impaired• Modest hardware and software requirements• Easily put together using off-the-shelf components

Natural Scene Categorization – p.9/32

Page 13: Categorization of natural images

System Overview (contd.)

• Results

◦ Cylindrical concentric mosaics

Natural Scene Categorization – p.10/32

Page 14: Categorization of natural images

System Overview (contd.)

• Results

◦ Cylindrical concentric mosaics

Natural Scene Categorization – p.10/32

Page 15: Categorization of natural images

System Overview (contd.)

• Results

◦ Still omnicam image

Natural Scene Categorization – p.11/32

Page 16: Categorization of natural images

System Overview (contd.)

• Results

◦ Still omnicam image

Natural Scene Categorization – p.11/32

Page 17: Categorization of natural images

System Overview (contd.)

• Results

◦ Omnivideo sequence

5

10

15

20

W

W

B

W

W

W

W

W

W

W

B

W

W

W

W

X

X

X

X

XW

W

W

W

W

W

B

W

W

W

1

n

nFORWARD DIRECTION

BACKWARD DIRECTION

1051 15 20

R R R L L

n

R

25

Natural Scene Categorization – p.12/32

Page 18: Categorization of natural images

System Overview (contd.)

• Results

◦ Omnivideo sequence

5

10

15

20

W

W

B

W

W

W

W

W

W

W

B

W

W

W

W

X

X

X

X

XW

W

W

W

W

W

B

W

W

W

1

n

nFORWARD DIRECTION

BACKWARD DIRECTION

1051 15 20

R R R L L

n

R

25

Natural Scene Categorization – p.12/32

Page 19: Categorization of natural images

Analyzing our results

• System accuracy: close to 70%– This is not enough!• Some scenes are inherently ambiguous!• Often the second best class is the correct class

• Limitations1. Limited discriminating power of global colour histogram

(GCH)2. Local colour histogram (LCH) based on tiling cannot be

used3. Each frame analyzed independently

• Possible solutions1. Adding memory to the system2. Clustering scheme before computing similarity measure

Natural Scene Categorization – p.13/32

Page 20: Categorization of natural images

Analyzing our results

• System accuracy: close to 70%– This is not enough!• Some scenes are inherently ambiguous!• Often the second best class is the correct class

• Limitations1. Limited discriminating power of global colour histogram

(GCH)2. Local colour histogram (LCH) based on tiling cannot be

used3. Each frame analyzed independently

• Possible solutions1. Adding memory to the system2. Clustering scheme before computing similarity measure

Natural Scene Categorization – p.13/32

Page 21: Categorization of natural images

Analyzing our results

• System accuracy: close to 70%– This is not enough!• Some scenes are inherently ambiguous!• Often the second best class is the correct class

• Limitations1. Limited discriminating power of global colour histogram

(GCH)2. Local colour histogram (LCH) based on tiling cannot be

used3. Each frame analyzed independently

• Possible solutions1. Adding memory to the system2. Clustering scheme before computing similarity measure

Natural Scene Categorization – p.13/32

Page 22: Categorization of natural images

Method I. Adding memory to the system

• System uses only the current observation in labeling• Good idea to use all observations upto the current one• Desired: A recursive implementation to calculate the

posterior (should be able to do it in real-time!)• Hidden Markov Model: Parameter estimation using Kevin

Murphy’s HMM toolkit

• Challenges1. Estimation of the transition matrix- possible solution is to

use limited classes2. Enormous training data required

Natural Scene Categorization – p.14/32

Page 23: Categorization of natural images

Method I. Adding memory to the system

• System uses only the current observation in labeling• Good idea to use all observations upto the current one• Desired: A recursive implementation to calculate the

posterior (should be able to do it in real-time!)• Hidden Markov Model: Parameter estimation using Kevin

Murphy’s HMM toolkit

• Challenges1. Estimation of the transition matrix- possible solution is to

use limited classes2. Enormous training data required

Natural Scene Categorization – p.14/32

Page 24: Categorization of natural images

Adding memory. . . (Results)

• Improved confidence in the results. However, negligibleimprovement in the accuracy

• Reasons for poor performance◦ Limited number of transitions in categories (as opposed

to locations◦ Typical training data for HMMs is thousands of labels:

difficult to collect such vast data• Limitation: Makes the system dependent on the system

dependent on the training sequence

Natural Scene Categorization – p.15/32

Page 25: Categorization of natural images

Method II. Preclustering the image

• Presence of clutter, images from a broad domain• Premise: The part of the image indicative of the semantic

category forms a distinct part in the feature space

Some test images belonging to the ‘Water-bodies’ category

• Possible solution: segment out the clutter in the scene

Natural Scene Categorization – p.16/32

Page 26: Categorization of natural images

Preclustering the image. . .

• K means clustering of the image• Use only pixels from the largest cluster to compute the

colour histogram

Results of K means clustering on the test images

• Results◦ Accuracy improves significantly– for ‘water-bodies’ class

improvement from 25% to about 72%

• Limitations: What about, say, a traffic scene?!

Natural Scene Categorization – p.17/32

Page 27: Categorization of natural images

Preclustering the image. . .

• K means clustering of the image• Use only pixels from the largest cluster to compute the

colour histogram

Results of K means clustering on the test images

• Results◦ Accuracy improves significantly– for ‘water-bodies’ class

improvement from 25% to about 72%

• Limitations: What about, say, a traffic scene?!

Natural Scene Categorization – p.17/32

Page 28: Categorization of natural images

Preclustering the image. . .

• K means clustering of the image• Use only pixels from the largest cluster to compute the

colour histogram

Results of K means clustering on the test images

• Results◦ Accuracy improves significantly– for ‘water-bodies’ class

improvement from 25% to about 72%

• Limitations: What about, say, a traffic scene?!

Natural Scene Categorization – p.17/32

Page 29: Categorization of natural images

Model-based approaches

• Stochastic models used to learn semantic concepts fromtraining images

• Use of normal perspective images• Use of local image features• Two models examined

1. probabilistic Latent Semantic Analysis (pLSA)2. Maximum entropy models

• Use of the ‘bag of words’ approach

Natural Scene Categorization – p.18/32

Page 30: Categorization of natural images

Bag of words approach

• Local features more robust to occlusions and spatialvariations

• Image represented as a collection of local patches• Image patches are members of a learned (visual)

vocabulary• Positional relationships not considered!• Data representation by a co-occurrence matrix

• Notation◦ D = {d1, . . . , dN} : corpus of documents◦ W = {w1, . . . , wM} : dictionary of words◦ Z = {z1, . . . , zK} : (latent) topic variables◦ N = {n(w, d)}: co-occurrence table

Natural Scene Categorization – p.19/32

Page 31: Categorization of natural images

Bag of words approach

• Local features more robust to occlusions and spatialvariations

• Image represented as a collection of local patches• Image patches are members of a learned (visual)

vocabulary• Positional relationships not considered!• Data representation by a co-occurrence matrix

• Notation◦ D = {d1, . . . , dN} : corpus of documents◦ W = {w1, . . . , wM} : dictionary of words◦ Z = {z1, . . . , zK} : (latent) topic variables◦ N = {n(w, d)}: co-occurrence table

Natural Scene Categorization – p.19/32

Page 32: Categorization of natural images

pLSA model . . .

• Generative model◦ select a document d with probability P (d)

◦ select a latent class z with probability P (z|d)

◦ select a word w with probability P (w|z)

• Joint observation probabilityP (d,w) = P (d)P (w|d), whereP (w|d) =

z∈Z P (w|z)P (z|d)

• Modeling assumptions1. Observation pairs (d,w) generated independently2. Conditional independence assumption

P (w, d|z) = P (w|z)P (d|z)

Natural Scene Categorization – p.20/32

Page 33: Categorization of natural images

pLSA model . . .

• Generative model◦ select a document d with probability P (d)

◦ select a latent class z with probability P (z|d)

◦ select a word w with probability P (w|z)

• Joint observation probabilityP (d,w) = P (d)P (w|d), whereP (w|d) =

z∈Z P (w|z)P (z|d)

• Modeling assumptions1. Observation pairs (d,w) generated independently2. Conditional independence assumption

P (w, d|z) = P (w|z)P (d|z)

Natural Scene Categorization – p.20/32

Page 34: Categorization of natural images

pLSA model . . .

• Generative model◦ select a document d with probability P (d)

◦ select a latent class z with probability P (z|d)

◦ select a word w with probability P (w|z)

• Joint observation probabilityP (d,w) = P (d)P (w|d), whereP (w|d) =

z∈Z P (w|z)P (z|d)

• Modeling assumptions1. Observation pairs (d,w) generated independently2. Conditional independence assumption

P (w, d|z) = P (w|z)P (d|z)

Natural Scene Categorization – p.20/32

Page 35: Categorization of natural images

pLSA model . . .

• Model fitting◦ Maximize the log-likelihood functionL =

d∈D

w∈Wn(d,w)logP (d,w)

◦ Minimizing the KL divergence between the empiricaldistribution and the model

◦ EM algorithm to learn model parameters

• Evaluating model on unseen test images◦ P (w|z) and P (z|d) learned from the training dataset◦ ‘Fold-in’ heuristic for categorization: learned factors

P (w|z) are kept fixed, mixing coefficients P (z|dtest) areestimated using the EM iterations

Natural Scene Categorization – p.21/32

Page 36: Categorization of natural images

pLSA model . . .

• Model fitting◦ Maximize the log-likelihood functionL =

d∈D

w∈Wn(d,w)logP (d,w)

◦ Minimizing the KL divergence between the empiricaldistribution and the model

◦ EM algorithm to learn model parameters

• Evaluating model on unseen test images◦ P (w|z) and P (z|d) learned from the training dataset◦ ‘Fold-in’ heuristic for categorization: learned factors

P (w|z) are kept fixed, mixing coefficients P (z|dtest) areestimated using the EM iterations

Natural Scene Categorization – p.21/32

Page 37: Categorization of natural images

pLSA model . . .

• Details of experiment to evaluate model◦ 5 categories: houses, forests, mountains, streets and

beaches◦ Image dataset: COREL photo CDs, images from internet

search engines, and personal image collections◦ 100 images of each category◦ Modifications in Rob Fergus’s code for the experiments◦ 128-dim SIFT feature used to represent a patch◦ Visual codebook with 125 entries

• Image annotationz = arg maxi P (zi|dtest)

Natural Scene Categorization – p.22/32

Page 38: Categorization of natural images

pLSA model. . . Results

• 50 runs of the experiment: with random partitioning on eachrun

• Vastly different accuracy on different runs: best case ∼ 46%,and worst case 5%

• Analysis of the results◦ Confusion matrix gives us further insights◦ Most of the labeling errors occur between houses and

streets◦ Ambiguity between mountains and forests

Natural Scene Categorization – p.23/32

Page 39: Categorization of natural images

pLSA model. . . Results

• 50 runs of the experiment: with random partitioning on eachrun

• Vastly different accuracy on different runs: best case ∼ 46%,and worst case 5%

• Analysis of the results◦ Confusion matrix gives us further insights◦ Most of the labeling errors occur between houses and

streets◦ Ambiguity between mountains and forests

Natural Scene Categorization – p.23/32

Page 40: Categorization of natural images

Results using the pLSA model

Figure 0: Some images that were wrongly anno-

tated by our system

Natural Scene Categorization – p.24/32

Page 41: Categorization of natural images

Results of the pLSA model . . .

• Comparison with the naive Bayes’ classifier

Figure 0: Confusion matrices for the pLSA and

naive Bayes models

• 10-fold cross validation test on the same dataset: meanaccuracy: ∼ 66%

Natural Scene Categorization – p.25/32

Page 42: Categorization of natural images

Analysis of our results

• Reasons for poor performance◦ Model convergence!◦ Local optima problem in the EM algorithm◦ Optimum value of the objective function depends on the

initialized values◦ We initialize the algorithm randomly at each run!

• Possible solution: Deterministic annealing EM (DAEM)algorithm

• Even with DAEM no guarantee of converging to the globaloptimal solution

Natural Scene Categorization – p.26/32

Page 43: Categorization of natural images

Analysis of our results

• Reasons for poor performance◦ Model convergence!◦ Local optima problem in the EM algorithm◦ Optimum value of the objective function depends on the

initialized values◦ We initialize the algorithm randomly at each run!

• Possible solution: Deterministic annealing EM (DAEM)algorithm

• Even with DAEM no guarantee of converging to the globaloptimal solution

Natural Scene Categorization – p.26/32

Page 44: Categorization of natural images

Maximum entropy models

• Maximum entropy prefers a uniform distribution when nodata are available

• Best model is the one that is:1. Consistent with the constraints imposed by training data2. Makes as few assumptions as possible

• Training dataset: {(x1, y1), (x2, y2), . . . , (xN , yN )}, where xi

represents an image and yi represents a label• Predicate functions

◦ Unigram predicate: co-occurrence statistics of a wordand a label

fv1,LABEL(x, y) =

{

1 if y=LABEL and v1 ∈ x

0 otherwise

Natural Scene Categorization – p.27/32

Page 45: Categorization of natural images

Maximum entropy models . . .

• Notation◦ f : predicate function◦ p(x, y): empirical distribution of the observed pairs◦ p(y|x): stochastic model to be learnt

• Model fitting: expected value of the predicate function w.r.t.to the stochastic model should equal the expected value ofthe predicate measured from the training data

• Constrained optimization problemMaximize H(p) = −

x,y p(x)p(y|x)logp(y|x)

s.t.∑

x,y p(x, y)f(x, y) =∑

x,y p(x)p(y|x)f(x, y)

• p(y|x) = 1Z(x)exp

∑ki=1 λifi(x, y)

Natural Scene Categorization – p.28/32

Page 46: Categorization of natural images

Maximum entropy models . . .

• Notation◦ f : predicate function◦ p(x, y): empirical distribution of the observed pairs◦ p(y|x): stochastic model to be learnt

• Model fitting: expected value of the predicate function w.r.t.to the stochastic model should equal the expected value ofthe predicate measured from the training data

• Constrained optimization problemMaximize H(p) = −

x,y p(x)p(y|x)logp(y|x)

s.t.∑

x,y p(x, y)f(x, y) =∑

x,y p(x)p(y|x)f(x, y)

• p(y|x) = 1Z(x)exp

∑ki=1 λifi(x, y)

Natural Scene Categorization – p.28/32

Page 47: Categorization of natural images

Maximum entropy models . . .

• Notation◦ f : predicate function◦ p(x, y): empirical distribution of the observed pairs◦ p(y|x): stochastic model to be learnt

• Model fitting: expected value of the predicate function w.r.t.to the stochastic model should equal the expected value ofthe predicate measured from the training data

• Constrained optimization problemMaximize H(p) = −

x,y p(x)p(y|x)logp(y|x)

s.t.∑

x,y p(x, y)f(x, y) =∑

x,y p(x)p(y|x)f(x, y)

• p(y|x) = 1Z(x)exp

∑ki=1 λifi(x, y)

Natural Scene Categorization – p.28/32

Page 48: Categorization of natural images

Results for the maximum entropy model

• Same dataset, feature and codebook as used for the pLSAexperiment

• Evaluation using Zhang Le’s maximum entropy toolkit

• 25-fold cross-validation accuracy: ∼ 70%

• The second best label is often the correct label: accuracyimproves to 85%

Figure 1: Confusion matrices for the maximum

entropy and naive Bayes models

Natural Scene Categorization – p.29/32

Page 49: Categorization of natural images

Results for the maximum entropy model

• Same dataset, feature and codebook as used for the pLSAexperiment

• Evaluation using Zhang Le’s maximum entropy toolkit

• 25-fold cross-validation accuracy: ∼ 70%

• The second best label is often the correct label: accuracyimproves to 85%

Figure 1: Confusion matrices for the maximum

entropy and naive Bayes models

Natural Scene Categorization – p.29/32

Page 50: Categorization of natural images

Results for the maximum entropy model

• Same dataset, feature and codebook as used for the pLSAexperiment

• Evaluation using Zhang Le’s maximum entropy toolkit

• 25-fold cross-validation accuracy: ∼ 70%

• The second best label is often the correct label: accuracyimproves to 85%

Figure 1: Confusion matrices for the maximum

entropy and naive Bayes modelsNatural Scene Categorization – p.29/32

Page 51: Categorization of natural images

A comparative study

Method # of catg. training # per catg. perf(%)

Maximum entropy 5 50 70

pLSA 5 50 46

Naive Bayes’ classifier 5 50 66

Fei-Fei 13 100 64

Vogel 6 ∼100 89.3

Vogel 6 ∼100 67.2

Oliva 8 250 ∼ 300 89

Table 0: A performance comparison with other

studies reported in literature.

Natural Scene Categorization – p.30/32

Page 52: Categorization of natural images

Future Work

• Further investigations into the pLSA model• Issue of model convergence• DAEM algorithm is not the ideal solution• Using a richer feature set, e.g., bank of Gabor filters• For maximum entropy models, ways to define predicates

that will capture semantic information better

Natural Scene Categorization – p.31/32

Page 53: Categorization of natural images

THANK YOU

Natural Scene Categorization – p.32/32