TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization with one-vs-all classifiers & MediaEval 2012 Tagging Task: Prediction based on One Best List and Confusion

05-10-2012

Challenge the future Delft University of Technology

TUD MediaEval 2012 Tagging Task

Reporter: Martha A. Larson Multimedia Information Retrieval Lab Delft University of Technology

2 Visual similarity measures for semantic video retrieval

Outline �

•  TUD-MM: Multi-modality video categorization with one-vs-all classifiers

•  Peng Xu, Yangyang Shi, Martha A. Larson

•  MediaEval 2012 Tagging Task: Prediction based on One Best List and Confusion Networks

•  Yangyang Shi, Martha A. Larson, Catholijn M. Jonker


05-10-2012


TUD-MM:Multi-modality video categorization with one-vs-all classifiers Peng Xu, Yangyang Shi, Martha A. Larson


Introduction �

•  Features from different modalities

•  Visual feature

•  Visual Words based representation & Global video representation

•  Text features

•  ASR, Metadata

•  Term-frequency, LDA

•  Classification and Fusion

•  One-vs-all linear SVMs

•  Reciprocal Rank Fusion

•  Post-processing procedure to assign one category label for each video



Visual representations �

•  Visual words based video representation •  SIFT features are extracted from each key-frame

•  Visual vocabulary is build by hierarchical k-means clustering

•  The normalized term-frequency of the entire video

•  Global video representation •  Edit features

•  Content features



Classification and Fusion

•  One-vs-all linear SVM •  C is determined by the 5-folder cross-validation

•  Reciprocal Rank Fusion (RRF)*

•  K=60 is to balance the importance of the lower ranked items

•  The weights w(r) are determined by the cross-validation errors from each modalities

•  Post-processing procedure

•  * G. V. Cormack, C. L. A. Clarke, and S. Buettcher. Reciprocal rank fusion outperforms Condorcet and individual rank learning methods. SIGIR '09, pages 758-759..



Result analysis �

•  MAP of different runs

•  Run_1 to Run_5 are official runs

•  Run_6 is the visual-only run without post-processing

•  Run_7 is the visual-only run with global feature

Run_1 Run_2 Run_3 Run_4 Run_5 *Run_6 *Run_7

MAP 0.0061 0.3127 0.2279 0.3675 0.2157 0.0577 0.0047



0

0,005

0,01

0,015

0,02

0,025

Random basline VW Global

Performance of visual features �


05-10-2012


MediaEval 2012 Tagging Task: Prediction based on One Best List and Confusion Networks Yangyang Shi, Martha A. Larson, Catholijn M. Jonker


Models for One-best list and Confusion Networks

ASR

Support vector

machine

Dynamic Bayesian Networks

Conditional random fields



One-best List SVM

Cut-off 3 vocabulary TF-IDF

Linear kernel multi-class SVM(c=0.5)



One-best List DBN

E1

T1

W2

T2

E2

W3

T3

E3

W1



One-best List DBN

• 



Results on Only ASR Run

Models MAP

Run2-one-best SVM 0.23

Run2-one-best DBN 0.25

Run2-one-best CRF 0.10

Run2-CN-CRF 0.09



Average Precision on Each Genre

0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8

DBN

SVM



Discussion and Future work �•  Discussion

•  Visual only methods can be improved in several ways

•  Features selection or dimensional reduction methods can be applied.

•  Genre-level video representation

•  CRF failure

•  A document is treated as a item rather than one word.

•  Feature size is too big to converge.

• DBN outperforms SVM: The sequence order information probably helps prediction

•  Potentials

•  Generate clear and useful labels

Video Search Reranking for Genre Tagging TUD MediaEval 2012 Tagging Task


Thank you!�

Video Search Reranking for Genre Tagging

Documents

TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization with one-vs-all classifiers & MediaEval 2012 Tagging Task: Prediction based on One Best List and Confusion