28
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and Machine Intelligence Antón Escobedo cse252c

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and

  • View
    217

  • Download
    1

Embed Size (px)

Citation preview

Page 1: 1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and

1

Learning to Detect Objects in Images via a Sparse, Part-Based Representation

S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and Machine Intelligence

Antón Escobedo cse252c

Page 2: 1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and

2

Outline

Introduction Problem Specification Related Work Overview of the Approach Evaluation Experimental Results and Analysis Conclusion and Future Scope

Page 3: 1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and

3

Introduction

Automatic detection of objects in images Different objects belonging to the same

category can vary Successful object detection system Proposed solution – Sparse-Part based

representation Part-based representation is

computationally efficient and has its roots in biological vision

Page 4: 1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and

4

Problem Specification

Input: An image Output: A list of locations at which instances of

the object class are detected in the image The experiments are performed on images of side

views of cars but can be applied to any object that consists of distinguishable parts arranged in a relatively fixed spatial configuration

The present problem is a “detection” problem rather than a simple “classification” problem

Page 5: 1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and

5

Previous Related Work

Raw Pixel Intensities Global Image Local features Part Based Representations using

hand labeled features

Page 6: 1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and

6

Algorithm Overview

Four Stages: Vocabulary Construction: Building a vocabulary of parts that

will represent objects

Image Representation: Input images are represented in terms of binary feature vectors

Learning a Classifier: Two target classes +feature vector (object) and –feature vector (nonobject)

Detection Hypothesis Using the Learned Classifier: Classifier activation map for the single-scale case Classifier activation pyramid for multiscale cases

Page 7: 1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and

7

Vocabulary Construction

Extraction of interest points using Forstner interest operator Experiments carried out on 50 representative images of size

100 x 40 pixels. A total of 400 patches, each of size 13 x 13 pixels were extracted

To facilitate learning, a bottom-up clustering procedure was adopted where similarity was measured by normalized correlation

Similarity between two clusters C1 and C2 is finally measured by the average similarity between their respective patches:

11 22

),(1

),( 21

21

21Cp Cp

ppsimilarityCC

CCsimilarity

1 2

2 21 2

( )

( ) ( )

E p pNormalizedCorrelation

E p E p

Page 8: 1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and

8

Vocabulary Construction

Forstner applied to sample image

Sample patches

Clusters from sample patches

Page 9: 1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and

9

Image Representation

For each patch q in an image, a similarity-based indexing is performed into the part vocabulary P using:

For each highlighted patch q, the most similar vocabulary part P*(q) is given by:

),(maxarg)(* qPsimilarityqPP

),(

),()/1(),(qPp

qpsimilarityPqPsimilarity

Page 10: 1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and

10

Image Representation: Feature Vector

Spatial relations among the parts detected in an image are defined in terms of distance (5 bins) and directions (8 ranges of 45 degrees each) giving 20 possible relations between 2 parts.

2-6 parts per Positive Window

Each 100x40 training image is represented as a feature vector with 290 elements. Pn

(i): ith occurrence of a part of type n in the image (1≤n≤270; n is a particular part-cluster)

Rm(j)(Pn1, Pn2): jth occurrence of relation Rm between

a part of type n1 and a part of type n2 (1≤m≤20; m is a distance-direction combination)

Page 11: 1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and

11

Learning a Classifier

Train classifier using 1000 labeled images, each 100 x 40 pixels in size

No synthetic training images

+ve examples: Various cars with varied backgrounds

- ve examples: Natural scenes like buildings, roads

High dimensionality of feature vector: 270 types, 20 relations, repeats.

Use of Sparse Network of Winnows (SNoW) learning architecture.

Winnow: to reduce in number until only the best are left

Page 12: 1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and

12

SNoW: Sparse network of linear units over a Boolean or real valued feature space

Target Nodes

set of examples e (represented as a list of active features)

Input Layer= Feature Layer

Edges are allocated dynamically

(activation)

Page 13: 1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and

13

SNoW: Predicted target t* for example e

*( ) argmax ( , ( ))t T t tt e e

( )t eActivation calculated by the summation for target node t

Learning Algorithm Specific Sigmoid function whose transition from an output close to 0 to an output close to 1, centers around θ  .

1( , ( ))

1e

e

θ - Ω

Page 14: 1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and

14

SNoW: Basic Learning Rules

Several weight update rules can be used: update rules are variations of Winnow and Perceptron

Winnow update rule: The number of examples required to learn a linear function grows linearly with the number of relevant features and only logarithmically with the total number of features.

Page 15: 1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and

15

A Training Example1, 1

1, 1001, 1006:1, 1, 1

2, 1002, 1007, 1008:

1, 1

3, 1006, 1004:

1001 1002 1003 1004 1005 1006 1007 1008 1009

31 2

2, 2 2, 2, 2

1, 1004, 1007:

2, 1, 2, 12, 2, 2, 2 2, 22, 1, 1, 2

3, 1004, 1005, 1009:

2, 1, 2, 1

Update rule: Winnow α = 2, β = ½, θ = 3.5

1001, 1005, 1007:

= 4 = 2 = 1

Page 16: 1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and

16

Detection Hypothesis using Learned Classifier

Classifier Activation Map – for single scale Neighborhood Suppresion: Based on nonmaximum

suppression. Repeated Part Elimination: Greedy algorithm, uses

windows around highest activation points.

Page 17: 1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and

17

Detection:Classifier Activation Pyramid

Scale the input image a number of times to form a multi-scale image pyramid

Apply the learned classifier to fixed-size windows in each image in the pyramid

Form a three-dimensional classifier activation pyramid instead of the earlier two-dimensional classifier activation map.

Page 18: 1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and

18

Evaluation Criteria

Test Set I consists of 170 images containing 200 cars of same size and is tested for single scale case. In this case for each car in the test images, the location of best 100 x 40 window containing the car is determined.

Test Set II consists of 108 images containing 139 cars of different sizes and is tested for multi scale case. In this case for each car in the test images, the location and scale of the best 100 x 40 window containing the car is determined.

Page 19: 1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and

19

Performance Measures

Goal is to maximize the number of correct detections and minimize the number of false detections.

One method for expressing the trade-off between correct and false detections is to use the receiver operating characteristics (ROC) curve. This curve plots the true positive rate vs. the false positive rate.

# of true positive (TP)True positive rate = --------------------------------------------------

Total # of positives in the data set (nP)

# of false positive (FP)False positive rate = -------------------------------------------------

Total # of negatives in the data set (nN) This measures the accuracy of the system as a “classifier” rather

than a “detector”.

Page 20: 1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and

20

Performance Measures (contd.)

We are really interested in knowing how many of the objects it detects (given by recall), and how often the detections it makes are false (given by 1-precision). This trade-off is thus captured very accurately by (recall) vs. (1-precision) curve; where

TP TPRecall = ------------- ; 1 – Precision = ---------------

nP TP + FP The threshold parameter that achieves the best trade-off

between the two quantities is measured by the point of highest F-measure, where

2 * Recall * Precision F-measure = --------------------------- Recall + Precision

Page 21: 1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and

21

Experimental Results

Activation Threshold

Recall (R)TP/200

Precision (P)TP/(TP+FP)

F-measure2*R*P/(R+P)

0.40 84.5 54.69 66.40

0.85 76.5 77.66 77.08

0.9995 4.0 100 7.69

Single-scale detection with Neighborhood Suppression Algorithm

Activation Threshold

Recall (R)TP/200

Precision (P)TP/(TP+FP)

F-measure2*R*P/(R+P)

0.20 91.5 24.73 38.94

0.85 72.5 81.46 76.72

0.995 4.0 100 7.69

Single-scale detection with Repeated Part Elimination Algorithm

Page 22: 1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and

22

Experimental Results (contd.)

Activation Threshold

Recall (R)TP/139

Precision (P)TP/(TP+FP)

F-measure2*R*P/(R+P)

0.65 50.36 24.56 33.02

0.95 38.85 49.09 43.37

0.9999 2.88 100 5.59

Multi-scale detection with Neighborhood Suppression Algorithm

Activation Threshold

Recall (R)TP/139

Precision (P)TP/(TP+FP)

F-measure2*R*P/(R+P)

0.20 80.58 8.43 15.27

0.95 39.57 49.55 44.0

0.9999 2.88 100 5.59

Multi-scale detection with Repeated Part Elimination Algorithm

Page 23: 1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and

23

Some Graphical Results

Page 24: 1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and

24

Analysis: A. Performance of Interest Operator

Page 25: 1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and

25

Analysis: B. Performance of Part Matching Process

Page 26: 1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and

26

Analysis: C. Performance of Learned Classifier

Page 27: 1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and

27

Conclusion

Automatic vocabulary construction from sample images

Methodologies for object detection Detector from Classifier Standardizing evaluation criterion

Good for classification of objects with distinguishable parts

Page 28: 1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and

28

Questions?

Slides adapted from http://www.cs.uga.edu/~ananda/ML_Talk.ppt and http://l2r.cs.uiuc.edu/~cogcomp/tutorial/SNoW.ppt