MMRetrieval.net A Multimodal Search Engine. Multimodal Information Single language text-only retrieval reach a limit. Content-based Image Retrieval

MMRetrieval.netA Multimodal Search Engine

Multimodal Information

Single language text-only retrieval reach a limit. Content-based Image Retrieval is computational

costly and still in infancy stages. Digital Information is increasingly becoming

multimodal Example: Wikipedia

Modality Dictionary: A tendency to conform to a general

pattern or belong to a particular group or category.

Definition of Modality in Information Retrieval It is unclear, fuzzy 1st Definition: Modality = Media 2nd Definition: Modality = Data Stream

MMRetrieval.net A Product of Cooperation Started June, 2010 Avi Arampatzis, Lecturer D.U.T.H. Konstantinos Zagoris, ph.D. D.U.T.H Savvas A. Chatzichristofis, ph.D. candidate D.U.T.H.

ImageCLEF 2010Wikipedia Retrieval Task ImageCLEF 2010 Wikipedia Collection Consisting of 237434 items Image Primary Media Noisy and Incomplete User Supplied Textual

Annotations Wikipedia Articles Containing the Images Written in any combination of English, German,

French, or any other unidentified language

Wikipedia Collection<image id="244845" file="images/25/244845.jpg"> <name>Balloons Festival - Chateaux d'Oex.jpg</name> <text xml:lang="en"> <description/> <comment/> <caption article="text/en/4/331622">Balloon festival </caption> </text> <text xml:lang="de"> <description/> <comment/> <caption/> </text> <text xml:lang="fr"> <description/> <comment/> <caption/> </text> <comment>(Balloon festival in Chateaux d'Oex. Category:Chateau d'Oex Category:Hot air balloons) </comment> <license>GFDL</license></image>

ImageCLEF 2010Wikipedia Retrieval Task 70 test topics consisting of a textual and a visual part three title fields (one per language—English,

German, French) one or more example images

Wikipedia Topic

<topic> <number>8</number> <title xml:lang="en">tennis player on court</title> <title xml:lang="de">tennisspieler auf dem platz</title> <title xml:lang="fr">joueur de tennis sur le terrain</title> <image>2197587684_94542c6fbd.jpg</image> <image>777629689_443a25ba08.jpg</image></topic>

Extraction of ModalitiesJoint Composite Descriptor (JCD)

Spartial Color Distribution (SpCD)

descriptioncommentcaptionarticlename

English,French,German

Lemur Toolkit V4.11 and Indri V2.11 with the tf.idf retrieval model

MMRetrieval.net Structure

Fusion in Information Retrieval combining evidence about relevance from

different sources of information from several modalities fusion consists of two components score normalization score combination

Score Normalization the relevance scores are not comparable popular text retrieval models (tf.idf) can be turned to

probabilities of relevance via the score-distributional method

image descriptors does not fit MinMax (maps linearly to the [0,1] ) Zscore (maps to the number of standard deviations it

lies above or below the mean score) non-linear Known-Item Aggregate Cumulative Density

Function (KIACDF)

Score Combination CompSUM CompMULT CompMAX CompMED CompWSUM

Results Participant MAP1 xrce 0.27652 unt 0.22513 telecom 0.22274 i2rcviu 0.21265 dcu 0.20396 cheshire 0.20147 duth 0.19988 uned 0.19279 daedalus 0.182010 sztaki 0.179411 nus 0.158112 rgu 0.061713 uaic 0.0423

Participant P@101 xrce 0.61142 duth 0.52003 i2rcviu 0.49714 cheshire 0.49295 telecom 0.49146 sztaki 0.48577 daedalus 0.44718 unt 0.43149 dcu 0.427110 uned 0.420011 nus 0.352912 rgu 0.227113 uaic 0.1543

Participant P@201 xrce 0.54072 duth 0.48363 telecom 0.44074 cheshire 0.43645 sztaki 0.43296 i2rcviu 0.43217 daedalus 0.40298 unt 0.39869 dcu 0.390710 uned 0.367111 nus 0.326412 uaic 0.152913 rgu 0.1514

Corrected Results Participant MAP1 xrce 0.27652 duth 0.25613 unt 0.22514 telecom 0.22275 i2rcviu 0.21266 dcu 0.20397 cheshire 0.20148 uned 0.19279 daedalus 0.182010 sztaki 0.179411 nus 0.158112 rgu 0.061713 uaic 0.0423

Participant P@101 xrce 0.61142 duth 0.52573 i2rcviu 0.49714 cheshire 0.49295 telecom 0.49146 sztaki 0.48577 daedalus 0.44718 unt 0.43149 dcu 0.427110 uned 0.420011 nus 0.352912 rgu 0.227113 uaic 0.1543

Participant P@201 xrce 0.54072 duth 0.49003 telecom 0.44074 cheshire 0.43645 sztaki 0.43296 i2rcviu 0.43217 daedalus 0.40298 unt 0.39869 dcu 0.390710 uned 0.367111 nus 0.326412 uaic 0.152913 rgu 0.1514

Fusion Problems appropriate weighing of modalities and score

normalization/combination are not trivial problems

if results are assessed by visual similarity only, fusion is not a theoretically sound method

Content-based Image Retrieval Problems Content-based Image Retrieval (CBIR) with global

features is notoriously noisy for image queries of low generality, i.e. the fraction of relevant images in a collection.

does not scale up well to large databases efficiency-wise

Two – Stage Image Retrieval how it works: first use the secondary modality to rank the

collection then perform CBIR only on the top-K items assumption: primary (image) – secondary (text) modalities hypothesis: CBIR can do better than text retrieval in small

sets or sets of high query generality efficient benefit: Using a ‘cheaper’ secondary modality,

this improves also efficiency by cutting down on costly CBIR operations

possible drawback: relevant images with empty or very noise secondary modalities would be completely missed

Previous Work Best results re-ranking by visual content has been

seen before mostly in different setups All these approaches employed a static predefined

K for all queries not clear if it works

Our Two-Stage Method dynamic K calculated dynamically per query optimize a predefined effectiveness measure without using external information or training

data

Retrieval Results

cockpit of an airplane

Image Only

Text Only

Static K=25

Dynamic K

Best Fusion Method – Max of Sums

i the index running over example images (i=1,2,…) j running over the visual descriptors ( {1,2})𝑗∈ DESCji is the score against the ith example image

for the jth descriptor parameter w controls the relative contribution of

the two media

𝑠=(1−𝑤 )max𝑖 (∑𝑗 𝑀𝑖𝑛𝑀𝑎𝑥 (𝐷𝐸𝑆𝐶 𝑗𝑖 ))+𝑤𝑀𝑖𝑛𝑀𝑎𝑥 (𝑡𝑓 .𝑖𝑑𝑓 )

Fusion vs Two-Stage

Implementation• developed in the C#/.NET

Framework 4.0• HTML, CSS and JavaScript (AJAX)

technologies for the interface• requires a fairly modern browser

Directions for Further Research Multi-stage retrieval for multimodal databases

based on modality hierarchy. Fuzzy Fusion (replace w with membership

function m). Create artificial modalities (not only from

relevance scores) pseudo relevance feedback – cross media

feedback

Publications Fusion vs Two-Stage for Multimodal Retrieval. Avi Arampatzis, Konstantinos

Zagoris, and Savvas A. Chatzichristofis. In: ECIR 2010, Under Review. Dynamic Two-Stage Image Retrieval from Large Multimodal Databases. Avi

Arampatzis, Konstantinos Zagoris, and Savvas A. Chatzichristofis. In: ECIR 2010, Under Review.

Multimedia Search with Noisy Modalities: Fusion and Multistage Retrieval. Avi Arampatzis, Savvas A. Chatzichristofis, and Konstantinos Zagoris. In: CLEF (Notebook Papers/LABs/Workshops), 22-23 September, Padua, Italy, 2010.

www.MMRetrieval.net: A Multimodal Search Engine. Konstantinos Zagoris, Avi Arampatzis, and Savvas A. Chatzichristofis. In: Proceedings of the 3rd International Conference on SImilarity Search and APplications, SISAP 2010, Istanbul, Turkey, September 18-19, 2010. © Association for Computing Machinery (ACM).

Ευχαριστ!ώ

Documents

MMRetrieval.net A Multimodal Search Engine. Multimodal Information Single language text-only retrieval reach a limit. Content-based Image Retrieval