Upload
jonathon-hare
View
131
Download
4
Tags:
Embed Size (px)
Citation preview
Spot the Dog An overview of semantic retrieval of unannotated images in the Semantic
Gap projectSemantic Image Retrieval - The User Perspective
Jonathon Hare Intelligence, Agents, Multimedia Group
School of Electronics and Computer ScienceUniversity of Southampton
{jsh2}@ecs.soton.ac.uk
The previous talks have described the issues associated with image retrieval from the practitioner perspective -- a problem that has become known as the ‘semantic gap’ in image retrieval.
This presentation aims to explore how the use of novel computational and mathematical techniques can be used to help improve content-based multimedia search by enabling textual search of unannotated imagery.
Introduction
Unannotated Imagery
Manually constructing metadata in order to index images is expensive.
Perhaps US$1-$5 per image for simple keywording.
More for archival quality metadata (keywords, caption, title, description, dates, times, events).
Every day, the number of images is increasing.
In many domains, manually indexing everything is an impossible task!
Unannotated Imagery An Example
Kennel club image collection.
relatively small (~60,000 images)
~7000 of those digitised.
~3000 of those have subject metadata (mostly keywords), remainder have little/no information.
Each year, after the Crufts dog show they expect to receive additional (digital) images [of the order of a few 1000] with little, if any metadata, other than date/time (and only then if the camera is set-up correctly).
An Overview of Our Approach
Conceptually simple idea: Teach a machine to learn the relationship between visual features of images and the metadata that describes them.
So, two stages:
Use exemplar image/metadata pairs to learn relationships.
Project learnt relationships to images without metadata in order to make them searchable.
Modelling Visual Information
In order to model the visual content of an image we can generate and extract descriptors or feature-vectors.
Feature-vectors can describe many differing aspects of the image content.
Low level features:
Fourier transforms, wavelet decomposition, texture histograms, colour histograms, shape primitives, filter primitives, etc.
Higher-level features:
Faces, objects, etc.
Visual Term Representations
A modern approach to modelling the content of an image is to treat it like a textual document.
Model image as a collection of “visual terms”.
Synonymous with words in a text document.
Feature-vectors can be transformed into visual terms through some mapping.
Visual Term Representations Bag-of-Terms
For indexing purposes, we often discount order/arrangement of terms and just count number of occurrences.
The quick brown fox
jumped over the lazy dog
brown dog fox jumped lazy over quick the
1 1 1 1 1 1 1 2[ ]1[ 2 0 0 6 ]
Visual Term Representations Example: Global Colour Visual Terms
A common way of indexing the global colours used in an image is the colour histogram.
The each bin of the histogram counts the number of pixels of the colour range represented by that bin.
The colour histogram can thus be used directly as a term occurrence vector in which each bin is represented as a visual term.
1569
3408
491
0 0
902
2146
5026
0 0 56
3633
0 0 0
6827
Visual Term Representations Example: Local interest-point based visual terms
Features based on Lowe’s difference-of-Gaussian region detector and SIFT feature vector.
A vocabulary of exemplar feature-vectors is learnt by applying k-means clustering to a training set of features.
Feature-vectors can then be quantised to discrete visual terms by finding the closest exemplar in the vocabulary.
Semantic SpacesBasic idea: Create a large multidimensional space in which images, keywords (or other metadata) and visual terms can be placed.
In the training stage learn how keywords are related to visual terms and images.
Place related visual terms, images and keywords close-together within the space.
In the projection stage unannotated images can be placed in the space based upon the visual terms they contain.
The placement should be such that they lie near keywords that describe them.
Semantic Spaces Conceptual Overview
Semantic Spaces Conceptual Overview
Semantic Spaces Uses of the space
Once constructed, the semantic space has a number of uses:
Finding images (both annotated and unannotated) by keyword(s)/metadata.
Finding images (both annotated and unannotated) by semantically similar images.
Determining likely metadata for an image.
Examining keyword-keyword and keyword-visual term relationships.
Segmenting an image.
Semantic Spaces Searching by Keyword
SUN
TRAIN
Semantic Spaces Searching by Keyword
SUN
TRAIN
Ranked Search Results:
Search for images about “SUN”
Semantic Spaces Searching by Keyword
SUN
TRAIN
Ranked Search Results:
Search for images about “SUN”
SUN
Semantic Spaces Searching by Keyword
SUN
TRAIN
Ranked Search Results:
Search for images about “SUN”
SUN
Semantic Spaces Searching by Image
Semantic Spaces Searching by Image
Search for images like this:
Ranked Search Results:
Semantic Spaces Searching by Image
Search for images like this:
Ranked Search Results:
Semantic Spaces Searching by Image
Search for images like this:
Ranked Search Results:
Semantic Spaces Suggesting Keywords
SUN
SKY
MOUNTAINTREE
CAR
Semantic Spaces Suggesting Keywords
Suggested keywords:
Suggest keywords for this image: SUN
SKY
MOUNTAINTREE
CAR
Semantic Spaces Suggesting Keywords
Suggested keywords:
Suggest keywords for this image: SUN
SKY
MOUNTAINTREE
CAR
Semantic Spaces Suggesting Keywords
Suggested keywords:
Suggest keywords for this image: SUN
SKY
MOUNTAINTREE
CAR
SKY MOUNTAIN TREE SUN CAR
CARSUN
TREE
SKY
MOUNTAIN
Semantic Spaces Experimental Retrieval Results - Corel Dataset
Colour Histograms used as visual terms (each bin representing a single term).
Standard experimental collection: 500 test images, 4500 training images.
Results quite impressive ~ comparable with Machine Translation auto-annotation technique (but remember we are using much simpler image features).
Works well for query keywords that are easily associated with a particular set of colours,
but not so well for the other keywords.
Semantic Spaces Experimental Retrieval Results - Corel Dataset
Top 15 images when querying for ‘sun’
Semantic Spaces Experimental Retrieval Results - Corel Dataset
Top 15 images when querying for ‘horse’
Semantic Spaces Experimental Retrieval Results - Corel Dataset
Top 15 images when querying for ‘foals’
Demo The K9 Retrieval System
We have built a demonstration system around the semantic space idea and applied it to images from the Kennel Club picture library (>7000 images, ∼3000 with keywords).
The system allows annotated images to be retrieved by keywords and concepts (keywords with thesaurus expansion).
Both annotated and unannotated images can also be retrieved using the semantic space and regular content-based techniques.
This brief demo will concentrate on retrieval of annotated images using keyword matching, and unannotated images using the semantic space.
Conclusions
Semantic retrieval of unannotated images is hard!
Our semantic space approach takes us some of the way, but there is still a long way to go.
Retrieval is limited by the choice of visual features, and how well those features relate to the keywords.
Questions?