Fine-grained ClassiﬁcationResults CUB200-2011 Birds 200 classes, 11788 images Train. Test Method Accuracy Anno. Anno. Parts Bbox Gor¨ ing et al. (2014) 57.8% Parts Bbox Simon et

Fine-grained Classification

Marcel Simon

Computer Vision GroupDepartment of Mathematics and Computer Science

Friedrich Schiller University Jena, Germany

[email protected]

http://www.inf-cv.uni-jena.de/

Seminar Talk

23.06.2015

[email protected]

http://www.inf-cv.uni-jena.de/

MotivationPart Constellation Models

ExperimentsSummary

Friedrich Schiller University Jena

Computer Vision Group

Outline

1 Motivation

2 Part Constellation Models

3 Experiments

4 Summary

Marcel Simon Fine-grained Classification 1


ExperimentsSummary



Outline

1 Motivation

2 Part Constellation Models

3 Experiments

4 Summary



ExperimentsSummary



Motivation

Three birds, but only two species.Which two images show the same species?

High intra-class, low inter-class variance!



ExperimentsSummary



Object Parts

Required for every kind of localized features

Problem: identification and robust detection

Additional challenge: ambiguous location



ExperimentsSummary



Part Proposals from CNNs

Pretrained CNNs contain inherent part detectors

Part detectors are generic, shared among all classes of ImageNet

Task: unsupervised selection of relevant part detectors for each objectcategory

Input

images CNN

Neural activation

maps

256 part

proposals

Input Output ACCV


Random Part Selection

Assumption: part detection are generic interest point detectorsspecialized on a specific pattern

In classification compute features at these interesting areas

256 Part

proposals

Random selection

Result:

8 selected

parts

Model-based Part Selection

Example: which part detector is relevant for birds?

Idea: select parts which fit a part constellation model

256 Part

proposals

View 1: View 2

Part

constellation

model

Result:

8 selected

parts


ExperimentsSummary



Part Constellation Model

Assuming constant normalized distance between parts

Part locations are Gaussian distributed with mean relative to anchor

Anchor Point

Mean Part Locations

Relative Offset



ExperimentsSummary



Learning Constellation Model

Given the part proposal locations µ, estimate part model parameters Γ:

Γ = argmaxΓ p (Γ | µ)

= ...

= argminΓ∈M

N∑

i=1

P∑

p=1

V∑

v=1

si ,vbv ,phi ,p ‖µi ,p − (ai + dv ,p)‖2



ExperimentsSummary



Solving the Problem

= argminΓ∈M

N∑

i=1

P∑

p=1

V∑

v=1

si ,vbv ,phi ,p︸︷︷︸

ti,v,p∈{0,1}

‖µi ,p − (ai + dv ,p)‖2

Solved by iteratively optimizing each variable independentlyIntuitive solutions for each variable, for example:

dv ,p =N∑

i=1

ti ,v ,p (µi ,p − ai ) /(

n∑

i ′=1

ti ′,v ,p).



ExperimentsSummary



Classification PipelinePart

SelectionPart

proposals

Feature

Extraction

Part

selection

Detect

parts

Feature

extraction

SVMMarcel Simon Fine-grained Classification 11

Results

CUB200-2011 Birds

200 classes, 11788 images

Train. Test Method AccuracyAnno. Anno.

Parts Bbox Goring et al. (2014) 57.8%Parts Bbox Simon et al. (2014) 62.5%Parts Bbox Donahue et al. (2014) 64.9%

Bbox None Simon et al. (2014) 53.8%

None None Xiao et al. (2015) (VGG19) 77.9%

None None Ours, constellation (AlexNet) 68.5%None None No parts (VGG19) 71.9%None None Ours, random (VGG19) 79.4%None None Ours, constellation (VGG19) 81.0%

After publication with citation:

None None Google 2015 84.1%None None Baidu 2015 84.9%


ExperimentsSummary



Results

Oxford flowers

102 classes, 8189 imagesMethod Accuracy

Angelova and Zhu (2013) 80.7%Murray and Perronnin (2014) 84.6%Azizpour et al. (2014) 91.3%

No parts (AlexNet) 90.4%Ours, random (AlexNet) 90.3± 0.2%Ours, constellation (AlexNet) 91.7%No parts (VGG19) 93.1%Ours, random (VGG19) 94.2± 0.2%Ours, constellation (VGG19) 95.3%

After publication with citation:

Baidu 2015 98.7%

NA Birds

555 classes, 48562 imagesTrain. Test Method AccuracyAnno. Anno.

Parts Parts Van Horn et al. (2015) 75.0%

None None No parts (GoogLeNet) 63.9%None None Ours, const. (GoogLeNet) 76.3%



ExperimentsSummary



Generic Classification Datasets

Approach applicable to all classification datasets

This is a large step compared to specialized fine-grained approaches

Caltech 256

256 classes, 30607 imagesMethod Accuracy

Zeiler and Fergus (2014) 74.20%Chatfield et al. (2014) 78.82%Simonyan and Zisserman (2014) (VGG19) 85.1%

No parts (AlexNet) 71.44%Ours, random (AlexNet) 72.39%Ours, constellation (AlexNet) 72.57%No parts (VGG19) 82.44%Ours, constellation (VGG19) 84.10%



ExperimentsSummary



Influence of Number of Parts

0 50 100 150 200 250

70

75

80

Number of parts used

Accuracy

in%

CUB200-2001 Birds, VGG19, 256 available parts

Ours, constellationOurs, random parts



ExperimentsSummary



Summary

CNN part

proposals

Constellation

model

Random selection

- Part constellation models for part proposal selection

81.0% on CUB200-2011, 76.3% on NA birds, no annotation

More information: http://goo.gl/fz06MU


http://goo.gl/fz06MU

ReferencesReferences



References I

Angelova, A. and Zhu, S. (2013). Efficient object detection and segmentation for fine-grainedrecognition. In CVPR.

Azizpour, H., Razavian, A. S., Sullivan, J., Maki, A., and Carlsson, S. (2014). From generic tospecific deep representations for visual recognition. CoRR, abs/1406.5774.

Chatfield, K., Simonyan, K., Vedaldi, A., and Zisserman, A. (2014). Return of the devil in thedetails: Delving deep into convolutional nets. In BMVC.

Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., and Darrell, T. (2014).Decaf: A deep convolutional activation feature for generic visual recognition. In ICML.

Goring, C., Rodner, E., Freytag, A., and Denzler, J. (2014). Nonparametric part transfer forfine-grained recognition. In CVPR.

Murray, N. and Perronnin, F. (2014). Generalized max pooling. In CVPR.

Simon, M., Rodner, E., and Denzler, J. (2014). Part detector discovery in deep convolutionalneural networks. In ACCV.

Simonyan, K. and Zisserman, A. (2014). Very deep convolutional networks for large-scale imagerecognition. CoRR, abs/1409.1556.





References II

Van Horn, G., Branson, S., Farrell, R., Haber, S., Barry, J., Ipeirotis, P., Perona, P., andBelongie, S. (2015). Building a bird recognition app and large scale dataset with citizenscientists: The fine print in fine-grained dataset collection. In CVPR, pages 595–604.

Xiao, T., Xu, Y., Yang, K., Zhang, J., Peng, Y., and Zhang, Z. (2015). The application oftwo-level attention models in deep convolutional neural network for fine-grained imageclassification. In CVPR.

Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding convolutional networks. InECCV.





Image References

Bird images are taken from the CUB200-2011 Dataset


Documents

Fine-grained ClassiﬁcationResults CUB200-2011 Birds 200 classes, 11788 images Train. Test Method Accuracy Anno. Anno. Parts Bbox Gor¨ ing et al. (2014) 57.8% Parts Bbox Simon et