Image Retrieval Part II. 2 Topics Applications of CBIR in digital library Human-controlled...

Preview:

Citation preview

Image Retrieval Part II

2

Topics

• Applications of CBIR in digital library

• Human-controlled interactive CBIR

• Machine-controlled interactive CBIR

3

Query by ExampleQuery by Example

Query Sample

Results

CBIRCBIR

“Get similar images”

• Pick query examples and ask the system to retrieve “similar” images.

4

QBIC(TM) – IBM's Query By Image Content

http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicSearch.mac/qbic?http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicSearch.mac/qbic?selLang=EnglishselLang=English

6

NETRA @ UCSB

http://nayana.ece.ucsb.edu/M7TextureDemo/Demo/client/M7TextureDemo.html

7

Medical Decision Support

• Breast cancer is among the top killers of women in the developed world.

• Early detection of malignancy can greatly reduce the risk of death.

MammogramMammogram

8

Nationwide Support for Physicians

9

Summary of Fundamental CBIR

• CBIR using query by example

• CBIR Algorithm:• First step --- Image Indexing

• Second step --- content matching

• Third step --- Ranking and displaying

• Relevance feedback (RF)

10

Step I: Image Indexing

• Image content = { Color, Shape, Texture }

• color: Color histogram, Color Moments

• Shape: Chaincodes, Fourier descriptors

• Texture: Gabor wavelet features, Co-occurrence matrix

• = { Color, Shape, Texture }

• example

1v

2v

3v

4v

5v

v

]1.0......05.02.0[v

11

Example of feature vectors in relational databaseExample of feature vectors in relational database

12

Step II: Content Matching

• Content similarity measure is obtained by a distance function:

where is the feature vector of query image

is the feature vector of image in the database

• Many distance function have been used:

• Euclidean distance

• l1-norm

• cosine measure

),( function distance qvvD

TqNqqq vvvv ],...,,[ 21

2

1

1

2)(),(

N

iqiiq vvvvD

N

iqiiq vvvvD

1

),(

q

qq

vv

vvvvS

),(

TNvvvv ],...,,[ 21

13

Step III: Similarity Ranking

• Calculate for each image in the database

• Sort in decreasing order (assume J=10)

• top 3 images: image6, image2, image9• top 5 images: image6, image2, image9, image10, image7

),( qjj vvD

),(),...,(),....,,(),,(),,( 332211 qJJqjjqqq vvDvvDvvDvvDvvD

54183710926 DDDDDDDDDD

jD

14

Problems with CBIR

In essence, retrieval is a pattern recognition problem with special characteristics.

• Huge volume of (visual) data.

• High dimensionality in feature space.

• Query design: Gap between the high level concepts and low level features.

• Linear matching criteria: A mismatch to the popular human perception model.

15

Example: Compressed Domain

Visual database in the compressed domain: DCT: Many of the current image/video coding standards;

JPEG, MPEG-1,2, and H.261/3. Wavelets/VQ: Related to the new image coding standard

JPEG2000.

Significant gap between human visual perception and information presentation in DCT/wavelet.

JPEG/MPEGImage.jpgImage.jpgvideo.mpgvideo.mpg feature vectorfeature vector

DatabaseDatabase

DCT coeff.DCT coeff.

16

State-of-the-art

• Human controlled interactive CBIR (HCI-CBIR) • Integrating human perception into content-based retrieval.

• Machine controlled interactive CBIR (MCI-CBIR)• To reduce bandwidth requirement for browsing and

searching over the Internet.• To minimize errors caused by excessive human

involvement.

Integrating human perception into content-based retrieval

18

Scenario

• Machine provides Machine provides initial retrieval resultsinitial retrieval results, through query-, through query-by-keyword, sketch, or example, etc.;by-keyword, sketch, or example, etc.;

• Iteratively:Iteratively:• User providesUser provides judgment judgment on the current results as to on the current results as to

whether, and to what degree, they are relevant to her/his whether, and to what degree, they are relevant to her/his request;request;

• The machine The machine learns and try againlearns and try again..

19

Relevance Feedback

Initialsample 1st Result

Query

2nd Result

Feedback Feedback

•User gives a feedback to the query resultsUser gives a feedback to the query results•System recalculates feature weights and modified System recalculates feature weights and modified queryquery

20

Basic GUI for Relevance Feedback

Slider

Checkbox

21

ImageGroup

Pallete PanelResult View

22

3D MARS

Structure

color

Texture

Initial DisplayResult

23

Human-Controlled Interactive CBIR (HCI-CBIR)Human-Controlled Interactive CBIR (HCI-CBIR)

• An attractive solution to numerous applications

• Main feature: an active role played by users to improve retrieval accuracy

• State-of-the-art• query design considerations • linear criteria in similarity ranking

24

Effective Retrieval through User Interaction

• Current systems:

• QBIC: interactive region segmentation (IBM).

• FourEyes: including human in the image annotation and retrieval loop, (MIT Media Lab).

• WebSEEk: dynamic feature vector recomputation based on the user’s feedback (Colombia University).

• PicHunter: a Bayesian framework for content based image retrieval (NEC Research Institute).

• PicToSeek: (UVA).

• MARS: a relevance feedback architecture in image retrieval (UIUC).

25

A “New” Proposal for HCI-CBIR

The framework: Relevance feedback

The key features:

Modeling: mapping a high level concept to low level features

Matching:

capturing user’s perceptual subjectivity to modify the query using non-linear measurement

overcoming the difficulties faced by the traditional linear matching criteria

26

The Relevance Feedback Framework

The goal: measure feature relevance to improve performance of image matching in retrieval

A supervised learning procedure based on regressionFor a given query z and a set of the retrieved items,

, category xn into two classes:

a relevant class (visually similar to z): x_m, m = 1, 2, …, M, and an irrelevant class (not similar to z): x_q, q = 1, 2, …, Q

Structure a new query based on the information in x_m and x_q Use the new query in the next round of retrieval

pn Nn xx ,,...,2,1,

27

Query Modification Model 1

The new query:

nonrelevant image

relevant image

original query

modified query

M

mm

Pi

Mz

11

1}{ xz1

28

Query Modification Models 2&3(Anti Re-enforcement Learning)

x

zazb

)( 1NzxzaZ2zxwhen

)( 2NzxzbZ2zxwhen

Where

Example (1-D)

The new query:

z1Nx 2Nx RN , small positive constantsCenter of relevant itemsCenter of non-relevant itemsquery at previous iteration

xx

z

)()(3 zxzxzz NR

)(}{ 12 zxxz NP

iz

M

mm

M 1

1xx

Q

q

qQ 1

1xx

29

Nonlinear Search Unit

Non-linear (Gaussian) Search Unit (NSU)

Small i : a relevant feature (sensitive to change)

Large i : a nonrelevant feature (insensitive to change)

P

i i

iiP

iiii

zxzxGf

12

2

1 2

)(exp)(),(

zx

Pizx imim

i ,...,1|,|max

TPi xxx ],...,,...,[ 1x

TPi zzz ],...,,...,[ 1z

- image feature vector:

- adjustable query vector:

- the tuning parameters (NSU widths):

30

Linear Search Unit (LSU)

• To benchmark the performance of the NSU

• To initiate the search

• The parameters: exactly the same as in the NSU

P

i i

iilinear

zxdS

1

2

2

2 ),(),(

zxzx

31

Architecture for Interactive CBIR

Perceptual similarityMeasure

RAM(feature database)

image database

query image

Feature Extraction&Similarity Measure

Weight-parameterupdating User Interaction

Initial Searching

Interactive searching

User

(tentative)Query

Output: k-retrieved images

when iteration n>0when iteration n=0

storing offeature vectors

32

VQ Codewords as “content descriptors”

Image Blocks

Codebook

i= 1

i= 2

i=n-2

i=n

i=n-1

i=n-3

i

Code labels

The usage of codewords reflects the content of the input image encoded.

33

Two-level WT/VQ coding (1 bpp)

CB1HL2

CB2HL1

CB3VL2

CB4VL1

Multiresolution CodebookMallat’s two-leveldecomposition

0.5 bpp

2 bpp

8 bpp2 bpp

0.5 bpp

0 bpp0.5 bpp

H5 H4 H3 H2 H1

CB5DL2

Label Histogram

34

Test Database 1: Bordatz database

• A texture image database provided by Mahjunath, at http://vivaldi.ece.ucsb.edu/users/wei/codes.html

• 1,856 patterns in 116 different classes

• 16 similar patterns in each class

• Maintained as a single unclassified image database

35

Queries (the Bordatz Database)

[116 different image classes]

36

Performance Comparison

• Methods Compared

• LSU2: linear search unit & query model 2

• NSU2: non-linear search unit & query model 2

• Interactive CBIR in MARS: Multimedia Analysis and Retrieval System (developed at UIUC)

37

Retrieval Results (the Bordatz Database)

Retrieval Rate (%)

Methods t=0 t=1 t=2 t=3

Avg. LSU2 73.7 83.0 85.1 85.9

NSU2 73.7 84.9 88.2 89.2

MARS 67.0 75.1 76.4 76.7

Table 1. Average retrieval rate (%)

Note: The retrieval rate is defined as the average percentage of images belonging to the same class as the query in the top 16 matched.

iARM: Interactive-based Analysis and Retrieval of Multimedia

OnThe Internet

@iarm.ee.ryerson.ca:8000/corel

39

Strategy

• iARM implements interactive retrieval for the large image database, running on the J2EE Web Server.

• Interaction architecture.• Based on a non-linear relevance feedback, a multi-

model SRBF network.• Positive and negative feedbacks.• Properties: local and non-linear learning, fast and robust

on a small input data.

40

• the positive examples;

• The SRBF network characterizes the query by multiple clusters, each of which is modeled by a p-D Gaussian distribution as:

(1)

where (2)

A single-pass Radial Basis function (SBRF) Network

,)(rmx Mm ,...,2,1

2

1

2)(

)(

2exp),,(

m

P

p

rmppp

mr

mm

xxG

xx

Mil ri

rm

im ,...,2,1||,||min )()( xx

41

• The Weighted-Euclidean Space

(3)

where (4)

• A summation of M Gaussian units (1) yields similarity function for the input vector as follows:

(5)

SRBF Network Cont..

,0 if1

,0 if 1

pp

pp

M

np

rpmp xx

M 1

2

12)( ])(

1[

M

mm

rmmGS

1

)( ),,()( xxx

x

Single-class approachSingle-class approach Multi-class approachMulti-class approach

nonrelevant image

relevant image

original query

modified query

43

• Tuning decision boundary with negative samples:

• Antireinforced Learning Algorithm:

• If (6)

• Then

(7)

Negative Feedback Strategy

Nnirn ,...,2,1,)( x

,, ikDD ki

P

p

rip

irppi xxD

1

2)()(

)],()()[()()1( )()()()( ttttt ri

irn

ri

ri xxxx

44

Performance of IARM

• Using Corel Image Collection, containing 40,000 real-life images, www.corel.com.

• A total of 400 queries were generated, and relevance judgments were based on the “ground true” from Corel.

• Multiple descriptors:

<shape, color, texture>

<Fourier descriptors, HSV color histogram& color moments, Gabor Wavelet transform>

45

Result

Non-interactive CBIR iARM

r(1) r(2) r(3) r(8)

53% 80.08% 85.99% 87.58% 89.00%

Table 1: Average Precision Rate (%) obtained by retrieving 400 queries, measured from the top 16 retrievals.

Test 1: Fast and Robust with small # relevance feedbacks

Example: Looking for “model”. 0. Start with choosing image at the bottom right corner as the query.

1. Result after the initial search, then five relevant images are the feedbacks.

2. Result after one relevance feedback: all the top sixteen are relevant.

Test 2: Non-linearity

1. Initial Result: for a kang-fu performance; here shape plays very important role, but only seven images are relevant.

2. After one feedbacks: non-relevant images that have similar shape were removed, and all returns are the kang-fu.

Case 1 (left) : only the full-body actions were selectedCase 2 (Right) : only the half-body actions were selected

After two feedbacks:

Test 3: Multi-model capturing

Multi-modeling can capture very precisely on the local context defined by the current query session

Initial results Final results

56

Summary

Incorporating human perception in retrieval:significant improvement over the simple CBIRimpressive improvement over other relevance

feedback systemssatisfactory user queries support

The query models: effective in compressed domain

Machine Controlled Interactive Content-based Retrieval (MCI-CBR)

58

Problem with HCI-CBR

• User interaction requires• User to specify `relevance’ or `nonrelevance’• Inconsistency human performance• Repeating many feedbacks for convergence• Transmission sample files, i.e., high bandwidth

• User-friendly environment (ideal preference) • Less training samples, i.e., < 20 images/iteration• Less feedbacks, i.e., 1-2 iterations

59

Search distributed DVL’s on Internet

Wide Coverage.

Full Features Search.

DBContent

Search Agent Broker

SOLO

ArchivistEngine

DB-1Feature

SAPMHost

DB-2Content

ArchivistEngine

DB-2Feature

SAPMHost

DB-NContent

ArchivistEngine

DB-NFeature

SAPMHost

Third party Workshop.

Third party Service.

Progressive Search.

Knowhow Control.

60

Machine Controlled Interactive CBIR (MCI-CBIR)

A key research area in multimedia processing

Aim: To incorporate self-learning capability into CBIR which allows:automatic & semi-automatic retrievalminimization of user participations to reduce errors caused

by human inconsistency performancereduction of bandwidth requirement in Internet browsing

and searching

61

HCI-CBIR vs MCI-CBIR

Human controlled Interactive System

Machine ControlledInteractive System

SearchUnit

ImageDatabase

RelevanceFeedback

Userinteraction

Query

Retrievalresults Search

Unit

ImageDatabase

RelevanceFeedback

Query

Retrievalresults

62

The Essence of MCI-CBR

• Based on two feature space: R1 and R2

• Space R1 is of reasonable quality & easy to calculate in retrieval, such as:

• DCT,• DWT

• Space R2 is of very high quality, but potentially computationally intensive in relevance identification

• Descriptors extracted from un-compressed images• Object and region based descriptors

63

Architecture for Automatic Interactive CBR

SearchUnit

WT/VQ Coded Image Database

RBFNQuery

Feature Extraction

Displayimages

Compressed domain Processing

Relevance Feedback Module

User interaction

Different perceptual subjectivity

Semi-automaticMode

Retrievalresults

Relevance Classification

64

SOTM

• Relevance classification is performed by a Self-Organizing Tree Map (SOTM) which offers:

• Independent learning based on competitive learning technique• A unique feature map that preserves topological ordering

• SOTM is more suitable than the conventional SOM when input feature space is of high dimensionality

65

SOTM algorithm

Step oneStep oneInitialize the root node with a Initialize the root node with a point selected at random point selected at random from the input space.from the input space.

The root node is represented The root node is represented by the blue cross, and the by the blue cross, and the data space is represented by data space is represented by the green spots.the green spots.

66

SOTM algorithm Cont..

Step twoStep two Randomly select a new data Randomly select a new data point point xx, and compute the , and compute the Euclidean distance, dEuclidean distance, djj, to , to

node node wwjj (j = 1,....,J), where J (j = 1,....,J), where J

is the total number of nodes, is the total number of nodes, here J = 1.here J = 1.

Step threeStep threeSelect the winning node, j*, Select the winning node, j*, with minimum dwith minimum djj. .

67

SOTM algorithm Cont..

Step fourStep fourIfIf d dj*j*((xx, , wwjj ) ) Threshold, where Threshold, where

Threshold decreases with timeThreshold decreases with timethenthen assign assign xx to the j*th cluster to the j*th cluster and update the weight vector and update the weight vector according to:according to:

wwjj(t+1) = (t+1) = wwjj(t) + (t) + (t)[(t)[xx(t) - (t) - wwjj(t)],(t)],

where where (t) is the learning rate, (t) is the learning rate, 0<0<(t)<1. The position of the (t)<1. The position of the updated node is indicated by the updated node is indicated by the red arrow.red arrow.

68

SOTM algorithm Cont..

Step four, cont.Step four, cont.Else form a new subnode Else form a new subnode starting with starting with xx. The map . The map now has two nodes, as now has two nodes, as indicated by the two blue indicated by the two blue arrows.arrows.

Step fiveStep fiveContinue from step 2.Continue from step 2.

69

SOTM vs SOM

SOTM No nodes converge to areas of zero data density

SOM Nodes converge to areas of zero data density

70

Retrieval Procedure (1)

a. Initial Search: for a given query z, retrieve the set of K-most similar images, F = {O1,O2,…,Ok}, using

‘nearest neighbor rule’ & the feature space R1 for retrieval

b. Characterization: use features in feature space R2 to describe the retrieved images :

Okxk

and obtain the training set:

F(R2) = {x1,…,xk,…,xK}, xkR2

71

Retrieval Procedure (2)

c. Relevance classification: use SOTM to classify the training vectors in F(R2), then use the results to label the retrieved images:

{Ok,yk}, k=1,…,K where yk=1 if Ok is the relevant images otherwise yk=0

d. Relevance Feedback Module: implement interactive learning methods (e.g., RF, non-linear RF, or RBFN) using the training set {Ok,yk}, k=1,…,K, and ‘feature space R1’ for retrieval

e. Go back to Step (b)

72

Experiment Setup

• Brodatz texture database (1,856 images), using 116 queries

• ARR: average retrieval rate based on ground true classes

denotes the size of the ground truth set denotes the number of ground truth found within the top 16

• Feature representations• R1: MHI features on compressed WT/VQ images• R2: Gabor wavelet features

]1,0[)(NG

)(NF)(RR

q

qq

NFNG

73

Automatic vs User Controlled Retrieval

Methods Average Retrieval Rate (ARR), % User’s RF (Iter.)

0 Iter. 1 Iter. 4 Iter.

(a) MCI-CBR 63.42 71.66 76.51 -

(b) HCI-CBR 63.42 77.64 80.17 4

∆=(b)-(a) - +5.98 +3.66

Table 1: A comparison of ARR (%) between MCI-CBR method and HCI-CBR method, obtained during retrieving 116 queries, using Brodatz database.

74

Retrieval Example

Non-Interactive Retrieval Automatic Interactive Retrieval

75

Semi-Automatic VS User-controlled Retrievals

A comparison of AVR at convergence, between semiautomatic and HCI-CBR method

76

Retrieval using DCT-compressed images

• Compressed domain descriptor is based on energy histogram of the low frequency DCT coefficients [Lay, 1999]

• JPEG photograph database distributed by Media Graphic Inc, consisting of nearly 4,700 JPEG color images

77

Result

Method Avg. Relative Precision (%)

Avg. # user RF for convergence (Iter.)

Non-interactive CBIR 49.82 -

Automatic interaction 79.18 -

User controlled 95.66 2.63

Semi-automatic CBIR 98.08 1.33

Retrieval results on JPEG database, Column 2: average relative precision (%); Column 3: average number of user feedbacks (iteration) required for convergence, averaged over 30 queries.

Non-interactive CBIR Automatic-interactive CBIR

Semiautomatic CBIR (two user’s RF) User controlled CBIR (three user’s RF)

Non-interactive CBIR Automatic-interactive CBIR

Semiautomatic CBIR (one user’s RF) User controlled CBIR (two user’s RF)

80

Summary

• MCI-CBIR • minimizes the role of users in CBIR• Semi-automatic retrieval reach optimal

performance quickly

Recommended