Upload
ronna
View
49
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Unsupervised Temporal Commonality Discovery Wen-Sheng Chu , Feng Zhou and Fernando De la Torre Robotics Institute, Carnegie Mellon University July 9, 2013. Unsupervised Commonality Discovery in Images. (Chu’10, Mukherjee’11, Collins’12). Where are the repeated patterns?. - PowerPoint PPT Presentation
Citation preview
Pre-CVPR Selective Transfer Machine for Facial AU Detection
Unsupervised Temporal Commonality Discovery
Wen-Sheng Chu, Feng Zhou and Fernando De la TorreRobotics Institute, Carnegie Mellon UniversityJuly 9, 20131
Unsupervised Commonality Discoveryin Images
Where are the repeated patterns?2
(Chu10, Mukherjee11, Collins12)Unsupervised Commonality Discoveryin Videos?We name it Temporal Commonality Discovery (TCD).Goal: Given two videos, discover common events in an unsupervised fashion.
3TCD is hard!1) No prior knowledge on commonalitiesWe do not know what, where and how many commonalities exist in the video2) Exhaustive search are computationally prohibitiveE.g., two videos with 300 frames have >8,000,000,000 possible matches.
possible locations possible lengths possibilities/sequence
Another possibilities/sequence
4Formulation
5Integer programming!Optimization: Interpretation
6
Optimization: Native SearchComplexity
7Optimization: Branch-and-BoundSimilar to the idea of ESS (Lampert08), we search the space by splitting intervals.
8Optimization: Branch-and-BoundBounding histogram bins
9
Bounding L1 distance:
Intersection similarity:
X2 distance:
Optimization: Branch-and-Bound
10Unlikelysearchregions(B1,E1,B2,E2; -10)Searching Structure
(B1,E1,B2,E2; 32)Priority queue(sorted by bound scores)(B1,E1,B2,E2; -50)(B1,E1,B2,E2; -105)
State S = (Rectangle set; score)11
(B1,E1,B2,E2; -105)Algorithm
(B1,E1,B2,E2; 32)Priority queue(sorted by bound scores)(B1,E1,B2,E2; -50)(B1,E1,B2,E2; -105)Top statePop out the top state
2. Split12(B1,E1,B2,E2; -105)Algorithm(B1,E1,B2,E2; 32)Priority queue(sorted by bound scores)(B1,E1,B2,E2; -50)Top state
(B1,E1,B2,E2; -76)
(B1,E1,B2,E2; -61)3. Compute bounding scores4. Push back thesplit states 13Algorithm(B1,E1,B2,E2; 32)Priority queue(sorted by bound scores)(B1,E1,B2,E2; -50)Top state(B1,E1,B2,E2; -76)(B1,E1,B2,E2; -61)The algorithm stop when the top state contains an unique rectangle.
Omit most of the search space withlarge distances14Compare with Relevant WorkDifference between TCD and ESS [1]/STBB[2]Different learning framework: Unsupervised v.s. Supervised New bounding functions for TCDDifference between TCD and [3]Different objective: Commonality Discovery v.s. Temporal Clustering
[1] Efficient subwindow search: A branch and bound framework for object localization, PAMI 2009.[2] Discriminative video pattern search for efficient action detection, PAMI 2011.[3] Unsupervised discovery of facial events, in CVPR 2010.15
Experiment (1): Synthesized Sequence
Histograms of the discovered pair of subsequences16Experiment (2): Discover Common Facial ActionsRU-FACS dataset*Interview videos with 29 subjects5000~8000 frames/videoCollect 100 segments that containing smiley mouths (AU-12)Evaluate in terms of averaged precision
17
* Automatic recognition of facial actions in spontaneous expressions, Journal of Multimedia 2006.Experiment (2): Discover Common Facial Actions
18Parametric settings for Sliding Windows (SW)
Log of #evaluations:
Quality of discovered patterns:a
Experiment (2): Speed EvaluationSpeed #evaluation of the distance function
19Experiment (2): Discover Common Facial ActionsCompare with LCCS* on -distance20* Frame-level temporal calibration of unsynchronized cameras by using Longest Consecutive Common Subsequence, ICASSP 2009.
Experiment (3): Discover Multiple Common Human MotionsCMU-Mocap dataset:http://mocap.cs.cmu.edu/15 sequences from Subject 861200~2600 frames and up to 10 actions/seqExclude the comparison with SW because it needs >1012 evaluations
21
Experiment (3): Discover Multiple Common Human Motions
22Experiment (3): Discover Multiple Common Human MotionsCompare with LCCS* on -distance23
Extension: Video IndexingGoal: Given a query , find the best common subsequence in the target videoA straightforward extension:
TemporalSearchSpace
24A Prototype for Video Indexing25
Summary26Questions?[1] Common Visual Pattern Discovery via Spatially Coherent Correspondences, In CVPR 2010.[2] MOMI-cosegmentation: simultaneous segmentation of multiple objects among multiple images, In ACCV 2010.[3] Scale invariant cosegmentation for image groups, In CVPR 2011.[4] Random walks based multi-image segmentation: Quasiconvexity results and GPU-based solutions, In CVPR 2012.[5] Frame-level temporal calibration of unsynchronized cameras by using Longest Consecutive Common Subsequence, In ICASSP 2009.[6] Efficient ESS with submodular score functions, In CVPR 2011.
27
http://humansensing.cs.cmu.edu/wschu/