Temporally Coherent Interpretations ... - cv-foundation.org · Temporally Coherent Interpretations...

Temporally Coherent Interpretations for Long Videos Using Pattern Theory

Fillipe Souza1, Sudeep Sarkar1, Anuj Srivastava2, Jingyong Su3

1Computer Science & Engineering, University of South Florida. 2Statistics, Florida State University. 3Mathematics & Statistics, Texas Tech University.

Graph-theoretical methods have successfully provided semantic and struc-tural interpretations of images and videos. A recent paper [1] introduceda pattern-theoretic approach that allows construction of flexible graphs forrepresenting interactions of actors with objects and inference is accomplishedby an efficient annealing algorithm. Actions and objects are termed gen-erators and their interactions are termed bonds; together they form high-probability configurations, or interpretations, of observed scenes. This workand other structural methods have generally been limited to analyzing shortvideos involving isolated actions.

In this paper, we provide an extension that uses additional temporalbonds across individual actions to enable semantic interpretations of longervideos. Longer temporal connections improve scene interpretations as theyhelp discard (temporally) local solutions in favor of globally superior ones.Using this extension, we demonstrate improvements in understanding longervideos, compared to individual interpretations of non-overlapping time seg-ments. We verified the success of our approach by generating interpretationsfor more than 700 video segments from the YouCook data set, with intricatevideos that exhibit cluttered background, scenarios of occlusion, viewpointvariations and changing conditions of illumination. Interpretations for longvideo segments were able to yield performance increases of about 70% and,in addition, proved to be more robust to different severe scenarios of classi-fication errors.

(a) (b) (c) (d)

(e) (f)

Figure 1: Illustration of advantage in using temporal bonds. Top rows showsframes from two consecutive segments of a video. The first segment depictsthe interaction put bowl down (the small one with the left hand) and secondsegment depicts stir ingredients in a bowl using spatula. (e) shows [1]’sinterpretations for both segments. (f) shows our approach’s interpretationfor both segments. Shaded circles denote correctly identified generators.

In pattern theory [2], the basic units of representations are generators g.We define the space of generators G= {g1, ...,gn} hierarchically - from (im-age) features at the bottom level to (human) actions at the highest, such thatG = G f eatures ∪Gob jects ∪Gactions. Such generators gi combine with eachother through bonds β j(gi) that satisfy predefined constraints (Figure 2).The resulting configurations of connected generators σ(g1, · · · ,gn) repre-sent a semantic understanding of the video content, an interpretation. Theinference goal is to generate high-probability interpretations given a set offeatures. Probabilistic structures are imposed using energies that have con-tributions from both data (classification scores) and prior information (on-tological constraints). The probability of an interpretation σ(g1, · · · ,gn) isexpressed as a product of terms associated with generators and bond inter-actions, written as

p(σ(.)) =∏

n(k,k′ )∈σ

A1/T (β j(gi),β j′ (gi′ ))

Z(T ), (1)

This is an extended abstract. The full paper is available at the Computer Vision Foundationwebpage.

where k = β j(gi) and k′= β j′ (gi′ ) denote bonds of generators, Z(T ) is the

partition function, T is set to 1, and n denotes the number of generatorsthat form an interpretation. Thus, its energy equivalent form is E(σ(.)) =− log p(σ(.))Z(T ), which results in

E(σ(.)) = − ∑(k,k′ )∈σ

logA(β j(gi),β j′ (gi′ )) (2)

The search for optimal interpretations is accomplished by minimizing the

Figure 2: An illustration showing how generators are combined with eachother using bonds and have their bond interaction energies computed.

energy function E(σ) using an MCMC-based simulated annealing algo-rithm that uses simple moves to propose interpretation changes and to acceptor reject them according to the posterior energy. A summary of the perfor-mance analysis in shown in Figure 3. Using temporal bonds and analyzinglonger videos allowed us to achieve up to 83% performance improvement.

Figure 3: Left:performance rate comparison for 1-segment video sequences.Right: performance rate comparison for 4-segment video sequences. Notethat temporal bonds are crucial to improve interpretation performance.

Acknowledgment: This research was supported in part by NSF grants1217515 and 1217676.

[1] Fillipe D M de Souza, Sudeep Sarkar, Anuj Srivastava, and JingyongSu. Pattern theory-based interpretation of activities. In ICPR, 2014.

[2] Ulf Grenander. General pattern theory-A mathematical study of regularstructures. Clarendon Press, 1993.

Temporally Coherent Interpretations ... - cv-foundation.org · Temporally Coherent Interpretations...

Documents

Temporally Coherent Clustering of Student Data · Session t = 3 Session t = 6 0.0 1.0 0.5 Transition probability Performance Performance Performance Performance Performance Performance

TESTING FOR TEMPORALLY DIFFERENTIATED RELATIONSHIPS AMONG POTENTIALLY ... · TESTING FOR TEMPORALLY DIFFERENTIATED RELATIONSHIPS AMONG POTENTIALLY ... JERRY H. RATCLIFFE Department

Temporally coherent 4D reconstruction of complex dynamic scenesepubs.surrey.ac.uk/810698/1/Armin_CVPR_OpenAccess.pdf · 2016-05-13 · IEEE Int. Conf. Computer Vision & Pattern Recognition

Synthesis of Spatially & Temporally Disaggregate …orfe.princeton.edu/~alaink/NJ_aTaxiOrf467F12/Mufti MSE... · Web viewSynthesis of Spatially & Temporally Disaggregate Person Trip

TEMPORALLY CONSISTENT REGION-BASED VIDEO EXPOSURE CORRECTION › ~ayuille › Pubs15 › Temporally... · TEMPORALLY CONSISTENT REGION-BASED VIDEO EXPOSURE CORRECTION Xuan Dong1,

A temporally abstracted Viterbi algorithm (TAV)

Emergence and equilibration of zonal winds in turbulent ...users.uoa.gr/.../constantinou-ioannou-comecap-2014.pdf · 1 Introduction . Spatially and temporally coherent jets are a

TEMPORALLY COHERENT 4D RECONSTRUCTION OF COMPLEX …cvssp.org/projects/4d/4DRecon/ppt.pdf · 2016. 7. 5. · Dance1 326 493 295 254 Magician 311 608 377 325 Odzemok 381 598 394 363

Bi-modal First Impressions Recognition using Temporally ...innovarul.github.io/docs/ECCV_2016_APA_presentation.pdfBi-modal First Impressions Recognition using Temporally Ordered Deep

Temporally Coherent Clustering of Student Data · Model selection Determine the number of clusters at each time step. 34 Model selection ... (2002), Model Selection and Multimodel

Learning Input Correlations through Nonlinear Temporally

Temporally Extended Actions For

Design of task-specific optical systems using broadband ......optical networks, the input light was both spatially and temporally coherent, i.e., utilized a monochromatic plane wave

Temporally downscaling precipitation intensity factors for

Synthesis of Spatially & Temporally Disaggregate …alaink/Orf467F12/NJ_aTaxisOrf467F12... · Web viewSynthesis of Spatially & Temporally Disaggregate Person Trip Demand: Application

Temporally Coherent Completion of Dynamic Shapes Hao Li, Linjie Luo, Daniel Vlasic, Pieter Peers, Jovan Popović, Mark Pauly, Szymon Rusinkiewicz ACM Transactions

Experimental Results for Temporally Overlapping … Results for Temporally Overlapping Pulses from Quantel EverGreen ... Experimental Results for Temporally Overlapping Pulses from

Temporally Coherent Interpretations for Long Videos Using ... · a low-level layer in which feature observations provide ev-idence for concepts from the top layers, such as sub-events

Conditional Generative Adversarial Networks (cGANs) for ...jmcauley/workshops/scmls...Remote Sens. 2019, 11, 2193 3 of 17 deal with spatially and temporally coherent datasets [31,34]

Temporally Coherent Completion of Dynamic Shapes