Exploiting video information for Meeting Structuring …

Exploiting video information for Meeting Structuring

Agenda

• Introduction

• Feature set extension

• Video features processing

• Video features integration

• Preliminary results

• Conclusions

Meeting Structuring (1)

• Goal: recognise events which involve one or more communicative modalities:

• Monologue / Dialogue / Note taking / Presentation / Presentation at the whiteboard

• Working environment: “IDIAP framework” :• 69 five minutes long meetings of 4 participants

• 30 transcribed meetings

• Scripted meeting structure

Meeting Structuring (2)• 3 audio derived feature families:

Speaker turns, Prosodic Features, Lexical Features

Mic. Array

Lapel Mic.

SpeakerTurns

Beam-forming

Rate Of Speech

Pitch baselineEnergy

Prosody

Transcription. M/DI discrimination Lexical features

Meeting Structuring (3)• Dynamic Bayesian Network based models (using GMTK, Bilmes et al.) • Multi-stream processing (parallel stream processing)• “Counter structure” (state duration modelling)

…. ….

A0 At At+1…. ….

….….

Corr Sub Del Ins AER

W.o. counter 91.7 4.5 3.8 2.6 10.9

With counter 92.9 5.1 1.9 1.9 9.0

• 3 feature families:• Prosodic features (S1)• Speaker Turns (S2)• Lexical features (S3)

• Leave-one-out cross-validation over 30 annotated meetings

100Sub Ins Del

AERTotalActionsNumber

Feature set extension (1)

Multi-party meeting are multi-modal communicative processes

Our features cover only two modalities: audio (prosodic features & speaker turns) and lexical

content (lexical monologue/dialogue discriminator)

Exploiting video contentsis the next step!!

Approach: extract low level video features and leave their interpretation to high level specialised models

The three most confused symbols

Feature set extension (2)

Three meeting actions which highlyinvolve body/hands movements

Goal: improve the recognition of “Note taking”, “Presentation” and “Whiteboard”

Feature set extension (3) We need motion features for hands/head-torso regions• Constraints:

– The system must be simple

– Reliable against “environmental” changes (lighting, backgrounds, …)

– Open to further extensions / modifications

• Initial assumptions:– Meetings video contents are quite “static”

– Participants occupy only few spatial regions

and tend to stay there

– Meeting room configuration (camera positions, seats,

furniture …) is fixed

Kanade Lucas Tomasi (KLT) feature tracking…

Video feature extraction (1)

• Motion analysis is performed using :

…and partitioning resulting trajectories according to their relative position into the scene

Four spatial regionsfor each scene:

Head 1 / 2Hands 1 / 2

KLT (1)Assumption: brightness of every point of a (slow) moving or static

object does not change for images taken at near time instants

(Taylor series approximated to the 1st derivative)( , ) ( , ) ( ) ...T I

I x dx t dt I x t I dx dtt

( , ) ( , )I x dx t dt I x t

( )TI dxI

Optical flow constraint equation :

Represents how fast the intensity is changing with time

( , )I x t

Moving object speeds

Brightness gradient

If we have one equation in two unknown; hence more than one solution 2x

KLT (2)• Minimizing weighted least square error:

• In two dimensions the system has the form:

• If the solution is :

2( , ) ( , )

( , ) ( , )x y x y

I I I dx Ix x y Idt xW x y W x y

dy I tI I Idt xx y x

22 ( , )( ) ( ( , ))T

dx I x tw x I x t

are neighbour points of x, withsame constant velocity

tx y y

IdxW A W I

det( ) 0x y

x x yA

KLT (3)A good feature is :1. one that can be tracked well … (Tomasi et al.)

if are the eigenvalues of , the system is well-conditioned if:

2. … and even better if it is part of a human body

( , )x y

1 2min( , ) Th

P( )SKIN Th

Large eigenvalues, but in the same range

Pixel with higher probability to be skin are preferred

(high texture content)

We decided to track n=100 featuresis a square (7x7) window

KLT (4)KLT feature tracking consists of 3 steps :

1. Select n good features

2. Track the selected n features

3. Replace lost features

min( )i Th P( )SKIN Th

tx y y

IdxW A W I

Skin modelling Color based approach: (Cr,Cb) chromatic subspace

Now: 3 components Gaussian Mixture Model

Initial experiments made using a single Gaussian

Skin samples taken from unused meetings

0.299 0.587 0.114

0.713 ( )

0.564 ( )

Y R G B

Cr V R Y

Cb U B Y

Structure of the implemented system:

Skin Detection

TrajectoryStructure

Skinmodel

100 features / frame

100 trajectories / frame

Trajectoriesclassification

Define 4partitions (regions)(2 x heads,2 x hands)

TrajectoryStructure

Evaluate:Average Motion

Remove:long and quite

static trajectories

Define 2 additionalfixed regions

Ha1 Ha2

+4 regions

4 regions

Video feature extraction (4)1. 2.

Open issues:• Loss of tracking for fast moving objects• Account during the tracking• Assumption of a fixed scene structure• Delayed/offline processing

For each scene 4 motion vectors, one for each region, are estimated(to be soon enhanced with 2 more regions/vectors L and R)

Ha1 Ha2

In order to detect if someone is entering or leaving the scene

Taking motion vectors averaged over many trajectories helps reducing noise

P( )SKIN

IntegrationGoal: extend the multi-stream model with a new video stream

Lexical features

Speaker turns

Prosodic featuresS0

…. ….

A0 At At+1…. ….

….….

Yt1 Yt+1

….….

Video features

It is possible that the extended model will beintractable due to the increased state space

In this case:

• State space reduction through a multi-time-scaleapproach will be attempted

• Early integration of Speaker turns + Lexical features will be investigated

Video features alone havequite poor performances,

but they seem to be helpfulif evaluated together with

Speaker Turns

Preliminary resultsBefore proceeding with the proposed integration we need to:

• compare video performances against the other features families• validate the extracted video features

(A) (Speaker Turns) + (Prosody + Lexical Features)(B) (Speaker Turns) + (Video Features)

Corr Sub Del Ins AER

(A) Two-stream model 87.8 4.5 7.7 3.2 15.4

(B) Two-stream model 90.4 3.2 6.4 4.5 14.1

Speaker Turns

Prosodic

Features

Lexical

Features

Accuracy % 85.9 69.9 52.6 48.1

Summary

– Extraction of video features through:• A skin detector enhanced KLT feature tracker

• Segmentation of trajectories into 4/6 spatial regions

(Simple and fast approach, but with some open problems)

– Validation of Motion Vectors as a video feature– Integration in the existing framework (work in progress)

Exploiting video information for Meeting Structuring …

Documents

Video Encryption Exploiting Non-Standard 3D Data ...uhl/IWSSIP_UHL.pdf · Video Encryption Exploiting Non-Standard 3D Data Arrangements Stefan A. Kramatsch, Herbert Stogner, and Andreas

Multimedia searching: Techniques and systemsarantxa.ii.uam.es/~jms/seminarios_doctorado/... · Such tools require exploiting the video logical structure and the video high-level semantic

Structuring 340B Contract Pharmacy Arrangements…media.straffordpub.com/products/structuring-340b-contract-pharmacy... · Structuring 340B Contract Pharmacy Arrangements: Meeting

Structuring the Organization · Structuring the Organization Thischaptercontainsthefollowingtopics: • StructuringtheOrganization, page 1 Structuring the Organization Overview

Structuring Trademark Clearance Opinionsmedia.straffordpub.com/products/structuring-trademark-clearance... · Structuring Trademark Clearance Opinions Assessing Search Results to

Exploiting Contextual Data for Event Retrieval in ...doras.dcu.ie/4713/1/dcu_CIVR_2009.pdf · Exploiting Contextual Data for Event Retrieval in Surveillance Video Philip ... vide

Structuring Trademark Coexistence Agreementsmedia.straffordpub.com/products/structuring-trademark-coexistence... · Structuring Trademark Coexistence Agreements Evaluating and Negotiating

Sub structuring in Ansys Sub structuring in Ansys

Structuring LLC Operating Agreements: Crafting Fiduciary ...media.straffordpub.com/products/structuring-llc-operating... · Structuring LLC Operating Agreements: Crafting Fiduciary

Exploiting with Metasploi Exploiting with Metasploit - hacking

Annual report 2009 · SOA models, has provided its clients with process and service integration activities, exploiting the usability of SW components and structuring managed services

Structuring International Contracts: Choice of Law ...media.straffordpub.com/products/structuring-international... · Structuring International Contracts: Choice of Law, Jurisdiction

Presentation: International Income Taxation Chapter 12: EXPLOITATION OF INTANGIBLES · 2 Chapter 12 – Exploiting Intangibles Outside U.S. Choices for structuring these arrangements:

EXPLOITING VIDEO AS AN OPPORTUNITY TO BUILD PERSONAL RELATIONSHIPS

The Sixth C Critical Media Analysis · analysis of a video Critical discourse analysis of the lyrics based on literacy standards. Elements to examine in video: Narrative structuring

Exploiting Temporal Consistency for Real-Time Video Depth …openaccess.thecvf.com/content_ICCV_2019/papers/Zhang... · 2019-10-23 · Exploiting temporal consistency for real-time

An Information-Theoretic Framework towards Large-Scale ...An Information-Theoretic Framework towards Large-Scale Video Structuring, Threading, and Retrieval Winston H. Hsu Video and

Exploiting Digital Video

Structuring Successful Joint Ventures: Navigating ...media.straffordpub.com/products/structuring-successful-joint... · Structuring Successful Joint Ventures: Navigating Formation,

Structuring Loan Participation Agreements, Conducting ...media.straffordpub.com/products/structuring-loan-participation... · Structuring Loan Participation Agreements, Conducting