Upload
asher-oneal
View
219
Download
0
Embed Size (px)
Citation preview
MMRetrieval.netA Multimodal Search Engine
Multimodal Information
Single language text-only retrieval reach a limit. Content-based Image Retrieval is computational
costly and still in infancy stages. Digital Information is increasingly becoming
multimodal Example: Wikipedia
Modality Dictionary: A tendency to conform to a general
pattern or belong to a particular group or category.
Definition of Modality in Information Retrieval It is unclear, fuzzy 1st Definition: Modality = Media 2nd Definition: Modality = Data Stream
MMRetrieval.net A Product of Cooperation Started June, 2010 Avi Arampatzis, Lecturer D.U.T.H. Konstantinos Zagoris, ph.D. D.U.T.H Savvas A. Chatzichristofis, ph.D. candidate D.U.T.H.
ImageCLEF 2010Wikipedia Retrieval Task ImageCLEF 2010 Wikipedia Collection Consisting of 237434 items Image Primary Media Noisy and Incomplete User Supplied Textual
Annotations Wikipedia Articles Containing the Images Written in any combination of English, German,
French, or any other unidentified language
Wikipedia Collection<image id="244845" file="images/25/244845.jpg"> <name>Balloons Festival - Chateaux d'Oex.jpg</name> <text xml:lang="en"> <description/> <comment/> <caption article="text/en/4/331622">Balloon festival </caption> </text> <text xml:lang="de"> <description/> <comment/> <caption/> </text> <text xml:lang="fr"> <description/> <comment/> <caption/> </text> <comment>(Balloon festival in Chateaux d'Oex. Category:Chateau d'Oex Category:Hot air balloons) </comment> <license>GFDL</license></image>
ImageCLEF 2010Wikipedia Retrieval Task 70 test topics consisting of a textual and a visual part three title fields (one per language—English,
German, French) one or more example images
Wikipedia Topic
<topic> <number>8</number> <title xml:lang="en">tennis player on court</title> <title xml:lang="de">tennisspieler auf dem platz</title> <title xml:lang="fr">joueur de tennis sur le terrain</title> <image>2197587684_94542c6fbd.jpg</image> <image>777629689_443a25ba08.jpg</image></topic>
Extraction of ModalitiesJoint Composite Descriptor (JCD)
Spartial Color Distribution (SpCD)
descriptioncommentcaptionarticlename
English,French,German
Lemur Toolkit V4.11 and Indri V2.11 with the tf.idf retrieval model
MMRetrieval.net Structure
Fusion in Information Retrieval combining evidence about relevance from
different sources of information from several modalities fusion consists of two components score normalization score combination
Score Normalization the relevance scores are not comparable popular text retrieval models (tf.idf) can be turned to
probabilities of relevance via the score-distributional method
image descriptors does not fit MinMax (maps linearly to the [0,1] ) Zscore (maps to the number of standard deviations it
lies above or below the mean score) non-linear Known-Item Aggregate Cumulative Density
Function (KIACDF)
Score Combination CompSUM CompMULT CompMAX CompMED CompWSUM
Results Participant MAP1 xrce 0.27652 unt 0.22513 telecom 0.22274 i2rcviu 0.21265 dcu 0.20396 cheshire 0.20147 duth 0.19988 uned 0.19279 daedalus 0.182010 sztaki 0.179411 nus 0.158112 rgu 0.061713 uaic 0.0423
Participant P@101 xrce 0.61142 duth 0.52003 i2rcviu 0.49714 cheshire 0.49295 telecom 0.49146 sztaki 0.48577 daedalus 0.44718 unt 0.43149 dcu 0.427110 uned 0.420011 nus 0.352912 rgu 0.227113 uaic 0.1543
Participant P@201 xrce 0.54072 duth 0.48363 telecom 0.44074 cheshire 0.43645 sztaki 0.43296 i2rcviu 0.43217 daedalus 0.40298 unt 0.39869 dcu 0.390710 uned 0.367111 nus 0.326412 uaic 0.152913 rgu 0.1514
Corrected Results Participant MAP1 xrce 0.27652 duth 0.25613 unt 0.22514 telecom 0.22275 i2rcviu 0.21266 dcu 0.20397 cheshire 0.20148 uned 0.19279 daedalus 0.182010 sztaki 0.179411 nus 0.158112 rgu 0.061713 uaic 0.0423
Participant P@101 xrce 0.61142 duth 0.52573 i2rcviu 0.49714 cheshire 0.49295 telecom 0.49146 sztaki 0.48577 daedalus 0.44718 unt 0.43149 dcu 0.427110 uned 0.420011 nus 0.352912 rgu 0.227113 uaic 0.1543
Participant P@201 xrce 0.54072 duth 0.49003 telecom 0.44074 cheshire 0.43645 sztaki 0.43296 i2rcviu 0.43217 daedalus 0.40298 unt 0.39869 dcu 0.390710 uned 0.367111 nus 0.326412 uaic 0.152913 rgu 0.1514
Fusion Problems appropriate weighing of modalities and score
normalization/combination are not trivial problems
if results are assessed by visual similarity only, fusion is not a theoretically sound method
Content-based Image Retrieval Problems Content-based Image Retrieval (CBIR) with global
features is notoriously noisy for image queries of low generality, i.e. the fraction of relevant images in a collection.
does not scale up well to large databases efficiency-wise
Two – Stage Image Retrieval how it works: first use the secondary modality to rank the
collection then perform CBIR only on the top-K items assumption: primary (image) – secondary (text) modalities hypothesis: CBIR can do better than text retrieval in small
sets or sets of high query generality efficient benefit: Using a ‘cheaper’ secondary modality,
this improves also efficiency by cutting down on costly CBIR operations
possible drawback: relevant images with empty or very noise secondary modalities would be completely missed
Previous Work Best results re-ranking by visual content has been
seen before mostly in different setups All these approaches employed a static predefined
K for all queries not clear if it works
Our Two-Stage Method dynamic K calculated dynamically per query optimize a predefined effectiveness measure without using external information or training
data
Retrieval Results
cockpit of an airplane
Image Only
Text Only
Static K=25
Dynamic K
Best Fusion Method – Max of Sums
i the index running over example images (i=1,2,…) j running over the visual descriptors ( {1,2})𝑗∈ DESCji is the score against the ith example image
for the jth descriptor parameter w controls the relative contribution of
the two media
𝑠=(1−𝑤 )max𝑖 (∑𝑗 𝑀𝑖𝑛𝑀𝑎𝑥 (𝐷𝐸𝑆𝐶 𝑗𝑖 ))+𝑤𝑀𝑖𝑛𝑀𝑎𝑥 (𝑡𝑓 .𝑖𝑑𝑓 )
Fusion vs Two-Stage
Implementation• developed in the C#/.NET
Framework 4.0• HTML, CSS and JavaScript (AJAX)
technologies for the interface• requires a fairly modern browser
Directions for Further Research Multi-stage retrieval for multimodal databases
based on modality hierarchy. Fuzzy Fusion (replace w with membership
function m). Create artificial modalities (not only from
relevance scores) pseudo relevance feedback – cross media
feedback
Publications Fusion vs Two-Stage for Multimodal Retrieval. Avi Arampatzis, Konstantinos
Zagoris, and Savvas A. Chatzichristofis. In: ECIR 2010, Under Review. Dynamic Two-Stage Image Retrieval from Large Multimodal Databases. Avi
Arampatzis, Konstantinos Zagoris, and Savvas A. Chatzichristofis. In: ECIR 2010, Under Review.
Multimedia Search with Noisy Modalities: Fusion and Multistage Retrieval. Avi Arampatzis, Savvas A. Chatzichristofis, and Konstantinos Zagoris. In: CLEF (Notebook Papers/LABs/Workshops), 22-23 September, Padua, Italy, 2010.
www.MMRetrieval.net: A Multimodal Search Engine. Konstantinos Zagoris, Avi Arampatzis, and Savvas A. Chatzichristofis. In: Proceedings of the 3rd International Conference on SImilarity Search and APplications, SISAP 2010, Istanbul, Turkey, September 18-19, 2010. © Association for Computing Machinery (ACM).
Ευχαριστ!ώ