Download ppt - A Dynamic Probabilistic Multimedia Retrieval Model

ICME 2004

Tzvetanka I. Ianeva

Arjen P. de Vries

Thijs Westerveld

A Dynamic Probabilistic Multimedia Retrieval Model

ICME 2004

Introduction• Video Representation schemes used for

retrieval:– Static– Spatio-temporal

• Video is a temporal media so a ‘good’ model solves the limitations of keyframe-based shot representation

ICME 2004

Spatio-temporal grouping

• Spatial priority and tracking of regions from frame to frame

• Joint spatial and temporal segmentation– Human vision finds salient structures jointly in space and

time (Gepshtein and Kubovy, 2000)

ICME 2004

Motivation

• Pursue video retrieval instead of image (keyframe) retrieval

• Extension of the Static Probabilistic Multimedia Retrieval model (2003)

• GMM in DCT-space-time domain– Diagonal covariance

ICME 2004

Static ModelDocs Models

•Indexing

- Estimate Gaussian Mixture Models from images using EM

- Based on feature vector with colour, texture and position information from pixel blocks

- Fixed number of components

ICME 2004

Static Model• Indexing

–Estimate a Gaussian Mixture Model from each keyframe (using EM)

–Fixed number of components (C=8)

–Feature vectors contain colour, texture, and position information from pixel blocks: < x,y,DCT >

ICME 2004

Static ModelModels

P(Q|M1)

P(Q|M4)

P(Q|M3)

P(Q|M2)

Query

• Retrieval–Calculate

conditional probabilities of query samples given models in collection

ICME 2004

Dynamic Model

• Selecting frames

– 1 second sequence around the keyframe

– Entire video shot as sequence of frames sampled at regular intervals

• Features < x, y, t, DCT >

ICME 2004

Dynamic Model

• Indexing:•GMM of

multiple frames around keyframe

•Feature vectors extended with time-stamp normalized in [0,1]: <x,y,t,DCT>

0

.5

1

ICME 2004

Dynamic Model

ICME 2004

Query example: A single image

• Artificial sequence of 29 images as the single query example where the time is normalized between 0 and 1

• Extend the query example image’s features with a fixed temporal feature value of 0.5

– Better results and lower computational cost

ICME 2004

Dynamic Model Advantages

• More training data for models– Less sensitive to random initialization

• Reduced dependency upon selecting appropriate keyframe

• Some spatio-temporal aspects of shot are captured– (Dis-)appearance of objects

ICME 2004

Dynamic Model

ICME 2004

Dynamic Model

ICME 2004

Dynamic Model

ICME 2004

Retrieval Framework• Smoothing

• Building dynamic GMMs

Likelihood goes to infinity ???

N

jjiji xPkwxkP

NwRSV

11log

1

Nc

ccicicii xGCPwxP

1,,, ,,

xx

nexG

1

2

1

2

1,,

ICME 2004

Experimental Set-up

• Build models for each shot– Static, Dynamic, Language

• Build Queries from topics– Construct simple keyword text query– Select visual example– Rescale and compress example images to

match video size and quality

ICME 2004

Combining Modalities• Independence assumption textual/visual

– P(Qt,Qv|Shot) = P(Qt|LM) * P(Qv|GMM)

• Combination works if both runs useful [CWI:TREC:2002]

• Dynamic run moreuseful than static run

Run MAP

ASR only .130

Static only .022

Static+ASR .105

Dynamic only .022

Dynamic+ASR .132

ICME 2004

Combining Modalities

Dynamic: Higher Initial Precision

ICME 2004

Dynamic: Higher initial precision

Static run

Dynamic run

ICME 2004

Dow Jones Topic (120)

ICME 2004

Dow Jones Topic (120)• “Dow Jones Industrial Average

rise day points”

+

=

ICME 2004

Conclusions

• Dynamic model captures visual similarity better– Spatio-temporal aspects– More training data– Apropriate key-frame less critical– Less sensitive to the random initialization

• ASR + dynamic better than either alone

ICME 2004

Future work• More data needs more computation effort – optimizations ?

• Avoid the singular solutions Dynamic number of components ?

• Full covariance in space-time < x,y,t >

• Integration of audio

ICME 2004

Thanks !!!

ICME 2004

Merging Run Results

• Combining (conflicting) examples difficult [CWI:TREC:2002]

• Single example Miss relevant shots

• Round-Robin Merging

123456789

10

123456789

10

Combined

11223344..

ICME 2004

Merging Run Results

ICME 2004

Merging Run Results

• Combining (conflicting) examples difficult [CWI:TREC:2002]

• Single example Miss relevant shots

• Round-Robin Merging

Combined

11223344..

123456789

10

123456789

10

+ASR

Single .022 .132

All .031 .149

Selected .039 .151

Best .050 .155

ICME 2004

Conclusions

• Visual aspects of an information need are best captured by using multiple examples

• Combining results for multiple (good) examples in round-robin fashion, each ranked on both modalities, gives near-best performance for almost all topics