Andrew Zisserman Talk - Part 1b

Example

Spatial re-ranking• improves precision

• but not recall …

Query images Prec.

• high precision at low recall (like google)

• variation in performance over query

• none retrieve all instances

Obtaining visual words is like a sensor measuring the image

“noise” in the measurement process means that some visual words are missing or incorrect, e.g. due to

• Missed detections• Changes beyond built in invariance• Quantization effects

Consequence: Visual word in query is missing in target image

Why aren’t all objects retrieved?

Clustered and quantized to visual words

sparse frequency vector

Set of SIFTdescriptorsquery image

Hessian-Affine regions + SIFT

descriptors

Hessian-Affine regions + SIFT

descriptors

1. Query expansion2. Better quantization

Query expansion

In text :• Reissue top N results as queries

• Pseudo/blind relevance feedback

• Danger of topic drift – this is a big problem for text

Query Expansion

Original query: Hubble Telescope Achievements

Example from: Jimmy Lin, University of Maryland

Query expansion: Select top 20 terms from top 20 documents according to tf-idf

Telescope, hubble, space, nasa, ultraviolet, shuttle, mirror, telescopes, earth, discovery, orbit, flaw, scientists, launch, stars, universe, mirrors, light, optical, species

Added terms:

Query Expansion: Text

In text :• Reissue top N results as queries

• Pseudo/blind relevance feedback

• Danger of topic drift – this is a big problem for text

In vision:• Reissue spatially verified image regions as queries

• Spatial verification like an oracle of truth

Query Expansion

Visual query expansion - overview

1. Original query

3. Spatial verification

4. New enhanced query

2. Initial retrieval set

5. Additional retrieved images

oracle

Query Image Originally retrieved image Originally not retrieved

What Query Expansion Adds

Visual query expansion - overview

1. Original query

3. Spatial verification

4. New enhanced query

2. Initial retrieval set

5. Additional retrieved images

oracle

Bag of visual words particular object retrieval

Hessian-Affineregions + SIFT descriptors

visual words+tf-idf weighting

querying

sparse frequency vector

centroids(visual words)

Invertedfile

ranked imageshort-list

Set of SIFTdescriptorsquery image

[Lowe 04, Chum & al 2007]

Geometricverification

[Chum & al 2007]

Queryexpansion

Query image Originally retrieved Retrieved only after expansion

Query Expansion

Queryimage

Expanded results (better)

Original results (good)

Better Quantization

Problems arising from quantization

• Typically, quantization has a significant impact on the final performance of the system [Sivic03,Nister06,Philbin07]

• Quantization errors split features that should be grouped together and confuse features that should be separated

Voronoicells

And more …

i. Points 3 and 4 are close, but never matched

i. Points 1, 2 and 3 are matched equally

Overcoming quantization errors• Soft-assign each descriptor to multiple cluster centers• Assignment weight according to Gaussian on distance• Normalize weights to sum to one

[Philbin et al. CVPR 2008, Van Gemert et al. ECCV 2008]

A: 0.1B: 0.5C: 0.4

B: 1.0 Hard Assignment

Soft Assignment

Learning a vocabulary to overcome quantization errors[Mikulik et al. ECCV 2010, Philbin et al. ECCV 2010]

Several other solutions are possible …

e.g. Hamming embedding [Jegou&Schmid ECCV 2008]• Standard quantization using bag-of-visual-words• Additional localization in the Voronoi cell by a binary signature

• More on methods of soft assignment tomorrow

Soft Assignment: Implementation

Bag of words: score a match between two features by the scalar product of their weight vectors

Spatial re-ranking: also score the number of inliers using this measure

Soft Assignment: ResultsBenefit 1: Helping Query Expansion

HardAssignment

Only one good initial result – QE doesn't significantly improve results

Soft Assignment: ResultsBenefit 1: Helping Query Expansion

SoftAssignment

4 good results – allows query expansion to return these results

in addition to the ones above

Soft Assignment: ResultsBenefit 2: Better spatial localization

HardAssignment

SoftAssignment

Results: Baseline to State of the Art

1. Baseline Method K = 10K 0.389

2. Large Vocabulary K=1M 0.618

3. Spatial Re-ranking 0.653

4. Soft Assignment (SA) 0.731

5. Query Expansion (QE) 0.801

Mean Average Precision

6. SA & QE 0.825

Disadvantages of soft assignment?

Outline

1. Object recognition cast as nearest neighbour matching

2. Object recognition cast as text retrieval

3. Large scale search and improving performance

4. Applications

• accessing expert knowledge, data mining, inpainting, location search, large scale reconstruction, mobile apps, …

5. The future and challenges

Application

Accessing expert knowledge:

• Use an image query to access an annotated dataset

• Search with query image retrieve annotation

Visual Access to Classical Art Archives

Currently: 111 thousand Greek vase images

http://explore.clarosnet.org/XDB/ASP/clarosHome/

Application:

Object Mining in Large Datasets

Objective …

Automatically find and group images of same object/scene

Motivation

Applications:

• Dataset summarization

• Efficient retrieval

• Efficient pre-processing for automatic 3-D reconstruction (e.g. PhotoSynth)

Matching Graph

Build a ‘matching graph’ over all the images in the dataset

Each image is a node and a link represents two images having some object in common

Given this graph structure, apply various clustering algorithms to group the data

Finding Commonly Occurring Objects

Simple idea: strong spatial constraints gives a 'link' between two images

Edge strength = # inliers

Finding Commonly Occurring ObjectsUse these links to build up a graph over all images in the

dataset

Nodes = images, edges = spatially verified matches

Building the Matching Graph

• Use each image to query the dataset

• Each query gives a list of results scored by a measure of the spatial consistency to the query

• Threshold this consistency measure to determine the links in the matching graph

Connected Components

In a collection of images of multiple disjoint objects we expect the matching graph to also be disjoint

A simple first step is to take connected components of the matching graph and examine the clusters returned

Connected Components

Example: five connected components from the Oxford dataset

56 images 71 images 26 images 25 images 56 images

Connected ComponentsA problem with connected components is that ‘connecting images’ can sometimes join two disjoint objects

LinkingImages

Can overcome this problem by divide and merge strategy

Datasets

Statue of Liberty dataset (37,034 images)• Crawled from Flickr by querying for ‘statue of

liberty’• Lots of images of the Statue of Liberty but also of

New York and other sites

Rome dataset (1,021,986 images) [1]• Again, crawled from Flickr• Contains too much stuff to mention

[1] Photo tourism: Exploring photo collections in 3D, Noah Snavely, Steven M. Seitz, Richard Szeliski

Results: Statue of Liberty

Largest cluster – 8461 images of the Statue of Liberty

2nd largest – 276 aerial views of New York

3rd largest – 80 American flags

Smaller clusters

Lego Statue of Liberty 59 images

Staten Island 52 images

Results: Rome

18676 images

15818 images

9632 images

4869 images

Timings

21,339 high resolution images from Flickr tagged with 'statue of liberty'

Querying with every image in the database to build the graph takes ~2 hours

Finding connected components (v quick) using a threshold of 20 spatially verified inliers gives 11 clusters with more than 20 images

As an aside …Better matching with fewer features[Turcot & Lowe, ICCV Workshop 2009].

• Build matching graph • Augment image bag-of-word histograms using neighbours

• Like query expansion, but done in advance on the `server side’

Application: Internet-based inpaintingPhoto-editing using images of the same place[Whyte, Sivic and Zisserman, 2009], but see also [Hays and Efros, 2007].

Application: place recognition (retrieval in a structured (on a map) database)

[Knopp, Sivic, Pajdla, ECCV 2010] http://www.di.ens.fr/willow/research/confusers/

Query Expansion(Panoramio, Flickr, … )

Best match

Image indexingwith spatial verification

Optimized image database

ConfuserSuppressionOnly negative training data

(from geotags)

Image database

Correctly recognized examples

More correctly recognized examples

Application: Matching and 3D reconstruction in large unstructured datasets.

Building Rome in a Day, SameerAgarwal, Noah Snavely, Ian Simon, Steven M. Seitz and Richard Szeliski,International Conference on Computer Vision, 2009http://grail.cs.washington.edu/rome/

See also [Havlena, Torrii, Knopp and Pajdla, CVPR 2009].Figure: N. Snavely

Example of the final 3D point cloud and cameras57,845 downloaded images, 11,868 registered images. This video: 4,619 images.

The Old City of Dubrovnik

Bing visual scan

Application: Mobile visual search apps

and others… Snaptell.com, Moodstocks.com

ExampleExample

Slide credit: I. Laptev

Sivic, J. and Zisserman, A.Video Google: A Text Retrieval Approach to Object Matching in VideosProceedings of the International Conference on Computer Vision (2003)http://www.robots.ox.ac.uk/~vgg/publications/papers/sivic03.pdf

Demo: http://www.robots.ox.ac.uk/~vgg/research/vgoogle/

Chum, O., Philbin, J., Isard, M., Sivic, J. and Zisserman, A.Total Recall: Automatic Query Expansion with a Generative Feature Model for Object RetrievalProceedings of the International Conference on Computer Vision (2007)http://www.robots.ox.ac.uk/~vgg/publications/papers/chum07b.pdf

Demo: http://www.robots.ox.ac.uk/~vgg/research/oxbuildings/

Philbin, J. and Zisserman, A.Object Mining using a Matching Graph on Very Large Image Collections Proc. of the Indian Conference on Vision, Graphics and Image Processing (2008)http://www.robots.ox.ac.uk/~vgg/publications/papers/philbin08b.pdf

Papers and Demos

Andrew Zisserman Talk - Part 1b

Education

Ken Chatfield James Philbin Andrew Zisserman

Andrew Zisserman - UCLAhelper.ipam.ucla.edu/publications/sews2/sews2_7272.pdfAndrew Zisserman (work with Ondřej Chum, ... 2. Scaling up ... Part 2: Scaling up: the Oxford buildings

Relja Arandjelovid and Andrew Zissermanvgg/publications/2012/Arandjelovic12a/... · Relja Arandjelovid and Andrew Zisserman ... Hercules and the Centaur Eurytion ... = max( BoW_score(image),

On-the-fly Specific Person Retrieval University of Oxford 24 th May 2012 Omkar M. Parkhi, Andrea Vedaldi and Andrew Zisserman

1 Deep Audio-Visual Speech Recognition · 1 Deep Audio-Visual Speech Recognition Triantafyllos Afouras, Joon Son Chung, Andrew Senior, Oriol Vinyals, Andrew Zisserman Abstract—The

Deep Inside Convolutional Networks: Visualising Image ...vgg/publications/2014/Simonyan14a/poster.pdf · Karen Simonyan, Andrea Vedaldi, Andrew Zisserman Visual Geometry Group, University

Florian Schroff, Antonio Criminisi & Andrew Zisserman ICCV 2007

O BJ C UT M. Pawan Kumar Philip Torr Andrew Zisserman UNIVERSITY OF OXFORD

Binary Neural Networks - University of Oxford · Alasdair Paren1 1Florian Jaeckle Leonard Berrada1 1M. Pawan Kumar1,2 Andrew Zisserman 1 Department of Engineering Science, University

Efficient Discriminative Learning of Parts-based Models M. Pawan Kumar Andrew Zisserman Philip Torr vgg

Bayesian Image Super-resolution, Continued Lyndsey C. Pickup, David P. Capel, Stephen J. Roberts and Andrew Zisserman, Robotics Research Group, University

Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman

Automatic Face Recognition for Film Character Retrieval in Feature-Length Films Ognjen Arandjelović Andrew Zisserman

Discriminative Sub-categorization Minh Hoai Nguyen, Andrew Zisserman University of Oxford 1

The information available to a moving observer from ... · The information available to a moving observer from specularities Andrew Zisserman ... secir 0 0 cos

Omkar M. Parkhi, Andrea Vedaldi, Andrew Zisserman Deep face recognition

JOSEF SIVIC AND ANDREW ZISSERMAN PRESENTERS: ILGE AKKAYA & JEANNETTE CHANG MARCH 1, 2011 Efficient Visual Search for Objects in Videos

Andrew Zisserman Talk - Part 1a

Camera Models CMPUT 498/613 Richard Hartley and Andrew Zisserman, Multiple View Geometry, Cambridge University Publishers, 2000 Readings: HZ Ch 6, 7

Invariant Large Margin Nearest Neighbour Classifier M. Pawan Kumar Philip Torr Andrew Zisserman