Upload
mediaeval2012
View
452
Download
1
Embed Size (px)
DESCRIPTION
Citation preview
05-10-2012
Challenge the future Delft University of Technology
TUD MediaEval 2012 Tagging Task
Reporter: Martha A. Larson Multimedia Information Retrieval Lab Delft University of Technology
2 Visual similarity measures for semantic video retrieval
Outline �
• TUD-MM: Multi-modality video categorization with one-vs-all classifiers
• Peng Xu, Yangyang Shi, Martha A. Larson
• MediaEval 2012 Tagging Task: Prediction based on One Best List and Confusion Networks
• Yangyang Shi, Martha A. Larson, Catholijn M. Jonker
TUD MediaEval 2012 Tagging Task
05-10-2012
Challenge the future Delft University of Technology
TUD-MM:Multi-modality video categorization with one-vs-all classifiers Peng Xu, Yangyang Shi, Martha A. Larson
4 Visual similarity measures for semantic video retrieval
Introduction �
• Features from different modalities
• Visual feature
• Visual Words based representation & Global video representation
• Text features
• ASR, Metadata
• Term-frequency, LDA
• Classification and Fusion
• One-vs-all linear SVMs
• Reciprocal Rank Fusion
• Post-processing procedure to assign one category label for each video
TUD MediaEval 2012 Tagging Task
5 Visual similarity measures for semantic video retrieval
Visual representations �
• Visual words based video representation • SIFT features are extracted from each key-frame
• Visual vocabulary is build by hierarchical k-means clustering
• The normalized term-frequency of the entire video
• Global video representation • Edit features
• Content features
TUD MediaEval 2012 Tagging Task
6 Visual similarity measures for semantic video retrieval
Classification and Fusion
• One-vs-all linear SVM • C is determined by the 5-folder cross-validation
• Reciprocal Rank Fusion (RRF)*
• K=60 is to balance the importance of the lower ranked items
• The weights w(r) are determined by the cross-validation errors from each modalities
• Post-processing procedure
• * G. V. Cormack, C. L. A. Clarke, and S. Buettcher. Reciprocal rank fusion outperforms Condorcet and individual rank learning methods. SIGIR '09, pages 758-759..
TUD MediaEval 2012 Tagging Task
7 Visual similarity measures for semantic video retrieval
Result analysis �
• MAP of different runs
• Run_1 to Run_5 are official runs
• Run_6 is the visual-only run without post-processing
• Run_7 is the visual-only run with global feature
Run_1 Run_2 Run_3 Run_4 Run_5 *Run_6 *Run_7
MAP 0.0061 0.3127 0.2279 0.3675 0.2157 0.0577 0.0047
TUD MediaEval 2012 Tagging Task
8 Visual similarity measures for semantic video retrieval
0
0,005
0,01
0,015
0,02
0,025
Random basline VW Global
Performance of visual features �
TUD MediaEval 2012 Tagging Task
05-10-2012
Challenge the future Delft University of Technology
MediaEval 2012 Tagging Task: Prediction based on One Best List and Confusion Networks Yangyang Shi, Martha A. Larson, Catholijn M. Jonker
10 Visual similarity measures for semantic video retrieval
Models for One-best list and Confusion Networks
ASR
Support vector
machine
Dynamic Bayesian Networks
Conditional random fields
TUD MediaEval 2012 Tagging Task
11 Visual similarity measures for semantic video retrieval
One-best List SVM
Cut-off 3 vocabulary TF-IDF
Linear kernel multi-class SVM(c=0.5)
TUD MediaEval 2012 Tagging Task
12 Visual similarity measures for semantic video retrieval
One-best List DBN
E1
T1
W2
T2
E2
W3
T3
E3
W1
TUD MediaEval 2012 Tagging Task
13 Visual similarity measures for semantic video retrieval
One-best List DBN
•
TUD MediaEval 2012 Tagging Task
14 Visual similarity measures for semantic video retrieval
Results on Only ASR Run
Models MAP
Run2-one-best SVM 0.23
Run2-one-best DBN 0.25
Run2-one-best CRF 0.10
Run2-CN-CRF 0.09
TUD MediaEval 2012 Tagging Task
15 Visual similarity measures for semantic video retrieval
Average Precision on Each Genre
0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8
DBN
SVM
TUD MediaEval 2012 Tagging Task
16 Visual similarity measures for semantic video retrieval
Discussion and Future work �• Discussion
• Visual only methods can be improved in several ways
• Features selection or dimensional reduction methods can be applied.
• Genre-level video representation
• CRF failure
• A document is treated as a item rather than one word.
• Feature size is too big to converge.
• DBN outperforms SVM: The sequence order information probably helps prediction
• Potentials
• Generate clear and useful labels
Video Search Reranking for Genre Tagging TUD MediaEval 2012 Tagging Task
17 Visual similarity measures for semantic video retrieval
Thank you!�
Video Search Reranking for Genre Tagging