20
Fine-grained Classification Marcel Simon Computer Vision Group Department of Mathematics and Computer Science Friedrich Schiller University Jena, Germany [email protected] http://www.inf-cv.uni-jena.de/ Seminar Talk 23.06.2015

Fine-grained ClassificationResults CUB200-2011 Birds 200 classes, 11788 images Train. Test Method Accuracy Anno. Anno. Parts Bbox Gor¨ ing et al. (2014) 57.8% Parts Bbox Simon et

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Fine-grained ClassificationResults CUB200-2011 Birds 200 classes, 11788 images Train. Test Method Accuracy Anno. Anno. Parts Bbox Gor¨ ing et al. (2014) 57.8% Parts Bbox Simon et

Fine-grained Classification

Marcel Simon

Computer Vision GroupDepartment of Mathematics and Computer Science

Friedrich Schiller University Jena, Germany

[email protected]

http://www.inf-cv.uni-jena.de/

Seminar Talk

23.06.2015

Page 2: Fine-grained ClassificationResults CUB200-2011 Birds 200 classes, 11788 images Train. Test Method Accuracy Anno. Anno. Parts Bbox Gor¨ ing et al. (2014) 57.8% Parts Bbox Simon et

MotivationPart Constellation Models

ExperimentsSummary

Friedrich Schiller University Jena

Computer Vision Group

Outline

1 Motivation

2 Part Constellation Models

3 Experiments

4 Summary

Marcel Simon Fine-grained Classification 1

Page 3: Fine-grained ClassificationResults CUB200-2011 Birds 200 classes, 11788 images Train. Test Method Accuracy Anno. Anno. Parts Bbox Gor¨ ing et al. (2014) 57.8% Parts Bbox Simon et

MotivationPart Constellation Models

ExperimentsSummary

Friedrich Schiller University Jena

Computer Vision Group

Outline

1 Motivation

2 Part Constellation Models

3 Experiments

4 Summary

Marcel Simon Fine-grained Classification 2

Page 4: Fine-grained ClassificationResults CUB200-2011 Birds 200 classes, 11788 images Train. Test Method Accuracy Anno. Anno. Parts Bbox Gor¨ ing et al. (2014) 57.8% Parts Bbox Simon et

MotivationPart Constellation Models

ExperimentsSummary

Friedrich Schiller University Jena

Computer Vision Group

Motivation

Three birds, but only two species.Which two images show the same species?

High intra-class, low inter-class variance!

Marcel Simon Fine-grained Classification 3

Page 5: Fine-grained ClassificationResults CUB200-2011 Birds 200 classes, 11788 images Train. Test Method Accuracy Anno. Anno. Parts Bbox Gor¨ ing et al. (2014) 57.8% Parts Bbox Simon et

MotivationPart Constellation Models

ExperimentsSummary

Friedrich Schiller University Jena

Computer Vision Group

Object Parts

Required for every kind of localized features

Problem: identification and robust detection

Additional challenge: ambiguous location

Marcel Simon Fine-grained Classification 4

Page 6: Fine-grained ClassificationResults CUB200-2011 Birds 200 classes, 11788 images Train. Test Method Accuracy Anno. Anno. Parts Bbox Gor¨ ing et al. (2014) 57.8% Parts Bbox Simon et

MotivationPart Constellation Models

ExperimentsSummary

Friedrich Schiller University Jena

Computer Vision Group

Part Proposals from CNNs

Pretrained CNNs contain inherent part detectors

Part detectors are generic, shared among all classes of ImageNet

Task: unsupervised selection of relevant part detectors for each objectcategory

Input

images CNN

Neural activation

maps

256 part

proposals

Input Output ACCV

Marcel Simon Fine-grained Classification 5

Page 7: Fine-grained ClassificationResults CUB200-2011 Birds 200 classes, 11788 images Train. Test Method Accuracy Anno. Anno. Parts Bbox Gor¨ ing et al. (2014) 57.8% Parts Bbox Simon et

Random Part Selection

Assumption: part detection are generic interest point detectorsspecialized on a specific pattern

In classification compute features at these interesting areas

256 Part

proposals

Random selection

Result:

8 selected

parts

Page 8: Fine-grained ClassificationResults CUB200-2011 Birds 200 classes, 11788 images Train. Test Method Accuracy Anno. Anno. Parts Bbox Gor¨ ing et al. (2014) 57.8% Parts Bbox Simon et

Model-based Part Selection

Example: which part detector is relevant for birds?

Idea: select parts which fit a part constellation model

256 Part

proposals

View 1: View 2

Part

constellation

model

Result:

8 selected

parts

Page 9: Fine-grained ClassificationResults CUB200-2011 Birds 200 classes, 11788 images Train. Test Method Accuracy Anno. Anno. Parts Bbox Gor¨ ing et al. (2014) 57.8% Parts Bbox Simon et

MotivationPart Constellation Models

ExperimentsSummary

Friedrich Schiller University Jena

Computer Vision Group

Part Constellation Model

Assuming constant normalized distance between parts

Part locations are Gaussian distributed with mean relative to anchor

Anchor Point

Mean Part Locations

Relative Offset

Marcel Simon Fine-grained Classification 8

Page 10: Fine-grained ClassificationResults CUB200-2011 Birds 200 classes, 11788 images Train. Test Method Accuracy Anno. Anno. Parts Bbox Gor¨ ing et al. (2014) 57.8% Parts Bbox Simon et

MotivationPart Constellation Models

ExperimentsSummary

Friedrich Schiller University Jena

Computer Vision Group

Learning Constellation Model

Given the part proposal locations µ, estimate part model parameters Γ:

Γ = argmaxΓ p (Γ | µ)

= ...

= argminΓ∈M

N∑

i=1

P∑

p=1

V∑

v=1

si ,vbv ,phi ,p ‖µi ,p − (ai + dv ,p)‖2

Marcel Simon Fine-grained Classification 9

Page 11: Fine-grained ClassificationResults CUB200-2011 Birds 200 classes, 11788 images Train. Test Method Accuracy Anno. Anno. Parts Bbox Gor¨ ing et al. (2014) 57.8% Parts Bbox Simon et

MotivationPart Constellation Models

ExperimentsSummary

Friedrich Schiller University Jena

Computer Vision Group

Solving the Problem

= argminΓ∈M

N∑

i=1

P∑

p=1

V∑

v=1

si ,vbv ,phi ,p︸ ︷︷ ︸

ti,v,p∈{0,1}

‖µi ,p − (ai + dv ,p)‖2

Solved by iteratively optimizing each variable independentlyIntuitive solutions for each variable, for example:

dv ,p =N∑

i=1

ti ,v ,p (µi ,p − ai ) /(

n∑

i ′=1

ti ′,v ,p).

Marcel Simon Fine-grained Classification 10

Page 12: Fine-grained ClassificationResults CUB200-2011 Birds 200 classes, 11788 images Train. Test Method Accuracy Anno. Anno. Parts Bbox Gor¨ ing et al. (2014) 57.8% Parts Bbox Simon et

MotivationPart Constellation Models

ExperimentsSummary

Friedrich Schiller University Jena

Computer Vision Group

Classification PipelinePart

SelectionPart

proposals

Feature

Extraction

Part

selection

Detect

parts

Feature

extraction

SVMMarcel Simon Fine-grained Classification 11

Page 13: Fine-grained ClassificationResults CUB200-2011 Birds 200 classes, 11788 images Train. Test Method Accuracy Anno. Anno. Parts Bbox Gor¨ ing et al. (2014) 57.8% Parts Bbox Simon et

Results

CUB200-2011 Birds

200 classes, 11788 images

Train. Test Method AccuracyAnno. Anno.

Parts Bbox Goring et al. (2014) 57.8%Parts Bbox Simon et al. (2014) 62.5%Parts Bbox Donahue et al. (2014) 64.9%

Bbox None Simon et al. (2014) 53.8%

None None Xiao et al. (2015) (VGG19) 77.9%

None None Ours, constellation (AlexNet) 68.5%None None No parts (VGG19) 71.9%None None Ours, random (VGG19) 79.4%None None Ours, constellation (VGG19) 81.0%

After publication with citation:

None None Google 2015 84.1%None None Baidu 2015 84.9%

Page 14: Fine-grained ClassificationResults CUB200-2011 Birds 200 classes, 11788 images Train. Test Method Accuracy Anno. Anno. Parts Bbox Gor¨ ing et al. (2014) 57.8% Parts Bbox Simon et

MotivationPart Constellation Models

ExperimentsSummary

Friedrich Schiller University Jena

Computer Vision Group

Results

Oxford flowers

102 classes, 8189 imagesMethod Accuracy

Angelova and Zhu (2013) 80.7%Murray and Perronnin (2014) 84.6%Azizpour et al. (2014) 91.3%

No parts (AlexNet) 90.4%Ours, random (AlexNet) 90.3± 0.2%Ours, constellation (AlexNet) 91.7%No parts (VGG19) 93.1%Ours, random (VGG19) 94.2± 0.2%Ours, constellation (VGG19) 95.3%

After publication with citation:

Baidu 2015 98.7%

NA Birds

555 classes, 48562 imagesTrain. Test Method AccuracyAnno. Anno.

Parts Parts Van Horn et al. (2015) 75.0%

None None No parts (GoogLeNet) 63.9%None None Ours, const. (GoogLeNet) 76.3%

Marcel Simon Fine-grained Classification 13

Page 15: Fine-grained ClassificationResults CUB200-2011 Birds 200 classes, 11788 images Train. Test Method Accuracy Anno. Anno. Parts Bbox Gor¨ ing et al. (2014) 57.8% Parts Bbox Simon et

MotivationPart Constellation Models

ExperimentsSummary

Friedrich Schiller University Jena

Computer Vision Group

Generic Classification Datasets

Approach applicable to all classification datasets

This is a large step compared to specialized fine-grained approaches

Caltech 256

256 classes, 30607 imagesMethod Accuracy

Zeiler and Fergus (2014) 74.20%Chatfield et al. (2014) 78.82%Simonyan and Zisserman (2014) (VGG19) 85.1%

No parts (AlexNet) 71.44%Ours, random (AlexNet) 72.39%Ours, constellation (AlexNet) 72.57%No parts (VGG19) 82.44%Ours, constellation (VGG19) 84.10%

Marcel Simon Fine-grained Classification 14

Page 16: Fine-grained ClassificationResults CUB200-2011 Birds 200 classes, 11788 images Train. Test Method Accuracy Anno. Anno. Parts Bbox Gor¨ ing et al. (2014) 57.8% Parts Bbox Simon et

MotivationPart Constellation Models

ExperimentsSummary

Friedrich Schiller University Jena

Computer Vision Group

Influence of Number of Parts

0 50 100 150 200 250

70

75

80

Number of parts used

Accuracy

in%

CUB200-2001 Birds, VGG19, 256 available parts

Ours, constellationOurs, random parts

Marcel Simon Fine-grained Classification 15

Page 17: Fine-grained ClassificationResults CUB200-2011 Birds 200 classes, 11788 images Train. Test Method Accuracy Anno. Anno. Parts Bbox Gor¨ ing et al. (2014) 57.8% Parts Bbox Simon et

MotivationPart Constellation Models

ExperimentsSummary

Friedrich Schiller University Jena

Computer Vision Group

Summary

CNN part

proposals

Constellation

model

Random selection

- Part constellation models for part proposal selection

81.0% on CUB200-2011, 76.3% on NA birds, no annotation

More information: http://goo.gl/fz06MU

Marcel Simon Fine-grained Classification 16

Page 18: Fine-grained ClassificationResults CUB200-2011 Birds 200 classes, 11788 images Train. Test Method Accuracy Anno. Anno. Parts Bbox Gor¨ ing et al. (2014) 57.8% Parts Bbox Simon et

ReferencesReferences

Friedrich Schiller University Jena

Computer Vision Group

References I

Angelova, A. and Zhu, S. (2013). Efficient object detection and segmentation for fine-grainedrecognition. In CVPR.

Azizpour, H., Razavian, A. S., Sullivan, J., Maki, A., and Carlsson, S. (2014). From generic tospecific deep representations for visual recognition. CoRR, abs/1406.5774.

Chatfield, K., Simonyan, K., Vedaldi, A., and Zisserman, A. (2014). Return of the devil in thedetails: Delving deep into convolutional nets. In BMVC.

Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., and Darrell, T. (2014).Decaf: A deep convolutional activation feature for generic visual recognition. In ICML.

Goring, C., Rodner, E., Freytag, A., and Denzler, J. (2014). Nonparametric part transfer forfine-grained recognition. In CVPR.

Murray, N. and Perronnin, F. (2014). Generalized max pooling. In CVPR.

Simon, M., Rodner, E., and Denzler, J. (2014). Part detector discovery in deep convolutionalneural networks. In ACCV.

Simonyan, K. and Zisserman, A. (2014). Very deep convolutional networks for large-scale imagerecognition. CoRR, abs/1409.1556.

Marcel Simon Fine-grained Classification 17

Page 19: Fine-grained ClassificationResults CUB200-2011 Birds 200 classes, 11788 images Train. Test Method Accuracy Anno. Anno. Parts Bbox Gor¨ ing et al. (2014) 57.8% Parts Bbox Simon et

ReferencesReferences

Friedrich Schiller University Jena

Computer Vision Group

References II

Van Horn, G., Branson, S., Farrell, R., Haber, S., Barry, J., Ipeirotis, P., Perona, P., andBelongie, S. (2015). Building a bird recognition app and large scale dataset with citizenscientists: The fine print in fine-grained dataset collection. In CVPR, pages 595–604.

Xiao, T., Xu, Y., Yang, K., Zhang, J., Peng, Y., and Zhang, Z. (2015). The application oftwo-level attention models in deep convolutional neural network for fine-grained imageclassification. In CVPR.

Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding convolutional networks. InECCV.

Marcel Simon Fine-grained Classification 18

Page 20: Fine-grained ClassificationResults CUB200-2011 Birds 200 classes, 11788 images Train. Test Method Accuracy Anno. Anno. Parts Bbox Gor¨ ing et al. (2014) 57.8% Parts Bbox Simon et

ReferencesReferences

Friedrich Schiller University Jena

Computer Vision Group

Image References

Bird images are taken from the CUB200-2011 Dataset

Marcel Simon Fine-grained Classification 19