ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

Unsupervised and weakly-supervised

discovery of events in video(and audio)

Fernando De la Torre


A dream


Outline

• Introduction• CMU-Multimodal Activity database• Unsupervised discovery of video events

• Aligned Cluster Analysis (ACA)• Weakly-supervised discovery of video events

• Detection-Segmentation SVMs• Conclusions


Quality of life technologies (QLoT)


Multimodal data collection• 40 subjects, 5 recipes• www.kitchen.cs.cmu.edu


Multimodal data collection• 40 subjects, 5 recipes• www.kitchen.cs.cmu.edu


Anomalous dataset


Time series analysis

• Anomalous detection formulated as detecting outliers in multimodal time series.– Supervised– Unsupervised– Semi-supervised or weakly supervised


Time series analysis

• Anomalous detection formulated as detecting outliers in multimodal time series.– Supervised– Unsupervised– Semi-supervised or weakly supervised


Unsupervised discovery ofevents in video


Motivation• Mining facial expression for one subject


• Mining facial expression for one subject

Motivation• Mining facial expression for one subject

• Summarization

• Visualization

• Indexing



Looking up Sleeping SmilingLooking forwardWaking up

Motivation

• Summarization

• Visualization

• Indexing


• Mining facial expression of one subject

Motivation

• Summarization

• Embedding

• Indexing



Motivation

• Summarization

• Embedding

• Indexing


Related work in time series

• Change point detection (e.g. Page ‘54, Stephens 94’, Lai ‘95, Ge and Smyth ‘00, Steyvers & Brown ’05, Murphy et al. ‘07, Harchaoui et al. ‘08)

• Segmental HMMs (e.g. Ge and Smith ‘00, Kohlmoren et al. ’01, Ding & Fan ‘07)

• Mixtures of HMMs (e.g. Fine et al. ‘98, Murphy & Paskin ‘01, Oliver et al. ’02, Alon et al. ‘03)

• Switching LDS (e.g. Pavolvic et al. ‘00, Oh et al. ‘08, Turaga et al. ‘09)

• Hierarchical Dirichelet Process (e.g. Beal et al. ‘02, Fox et al. ‘08)

• Aligned Cluster Analysis (ACA)


Summarization with ACA


MG xy

Kernel k-means and spectral clustering(Ding et al. ‘02, Dhillon et al. ‘04, Zass and Shashua ‘05, De la Torre ‘06)

2||||),( FJ MGXGM

1

2

3

4

5

6

7

8

9

10x

y

G

xyX

)))((()( 1n GGGGIKG TTtrJ

)(

)()( XXK T

M xy

G


2)(),,(

FacaJ MGXGM

Problem formulation for ACA

H )..[)..[)..[ 13221,...,,

mm hhhhhh XXX

)..[ 21 hhX )..[ 32 hhX )..[ 1mm hhX

Labels (G)3h

Start and end of the segments (h)mh 1mh

Dynamic Time Alignment Kernel (Shimodaira et al. 01)

1h 2h 4h


k

ccSS

m

ici mg

ii1

2

2)..[1

1X


X [Si , Si+1) mc

X [Si , Si+1)

mc

2

)..[)..[)..[ ),...,,(),,(13221 Fssssssaca mm

J MGXXXSGM

Problem formulation for ACA


Matrix formulation for ACA

GGGGILKL 1n )(with)( TT

kmk trJ

samples

segm

ents

2371,0 H

GHGGGHILWLK 1n )(with))o(( TTT

aca trJ

2323RW

clus

ters

segments

731,0 G

)()( XXK T


23 frames, 3 clusters


Facial image features

Appearance

• Active Appearance Models (Baker and Matthews ‘04)

Upper face

Lower face

Shape• Image features


Unsupervised facial event discovery


• Cohn-Kanade: 30 people and five different expressions (surprise, joy, sadness, fear, anger)

Facial event discovery across subjects


ACA Spectral Clustering (SC)

0.87(.05) 0.56(.04)

• Cohn-Kanade: 30 people and five different expressions (surprise, joy, sadness, fear, anger)

Facial event discovery across subjects

• 10 sets of 30 people


Honey bee dance(Oh et al. ‘08)

Seq 1 Seq 2 Seq 3 Seq 4 Seq 5 Seq 6

ACA 0.845 0.925 0.600 0.922 0.878 0.928

PS- SLDS (Oh et al. ‘08) 0.759 0.924 0.831 0.934 0.904 0.910

HDP- VAR(1)-HMM (Fox et al. ‘08)

0.465 0.441 0.456 0.832 0.932 0.887

Spectral Clustering 0.698 0.631 0.509 0.671 0.577 0.649

Three behaviors: 1-waggling2-turning left3-turning right


Clustering human motion


Weakly supervised discoveryof events in images and video


Spot the differences!


What distinguish these images?


Classification of time series


Similarity of these problems?

• Global statistics are not distinctive enough!• Better understanding of the discriminative regions or events


ImageImage Bag of ‘regions’Bag of ‘regions’

At least one positive

All negative


Support vector machines (SVMs)

2

2

1

w


Learning formulation• Standard SVM

-3-2

-1

-10.5

3

(Andrews et. al. ’03, Felzenszwalb et al. ‘08)


Optimization

all possible subwindows 100ms/image (480*640 pixels)(Lampert et al. CVPR08)

1)

2)

0.5

0.1

3) SVM with QP

-3-2

-1 1

2


Discriminative patterns in time series

At most k disjoint intervals

We name it:k-segmentation

• Efficient search: Global optimum guaranteed!

10ms/sequence (15000 frames)


Representation of signals

Training data

Compute frame-levelfeature vectors

IDs of visual words

Visual dictionary

Visual dictionary

clustering

5,10,97,...,9,42,10,91


K-segmentation

Original signal

IDs of visual words

Histogram of visual words

We need:

40,13,10,5,10,97,...,9,42,10,91


What is ?

SVM parameters

Original signal (x)

IDs of visual words 40,13,10,5,10,97,...,9,42,10,91

x

xwi

iT w)(

401310510979421091 ,,,,,,...,,,, wwwwwwwwww

m-segmentation (m+1)-segmentation

Consider m-segmentation:

Situation 1:

Situation 2:


Experiment 1 – glasses vs. no-glasses• 624 images, 20 people under different expression/pose• 8 people training (126 sunglasses, 128 no glasses), 12 testing (185 sunglasses and 185 no glasses)


Localization result


Experiment 2 – car vs. no car• 400 images, half contains cars and other half no cars. • Each image 10,000 SIFT descriptors and a vocabulary of 1,000 visual words.


Localization result


Bad localization cases


Classification performance

Human labelsOur method outperforms SVM with human labels!!!

whole image discriminative regions


Experiment 3 – synthetic data

Positive class

Negative class

Result

k: maximum number of disjoint intervals.

Accu

racy


Experiment 4 – mouse activity

• Mouse activities:– Drinking, eating, exploring, grooming, sleeping


Result – F1 scores


Conclusions• CMU Multimodal Activity database• Unsupervised discovery of events in time-series

– Aligned Cluster Analysis for summarization, indexing and visualization of time-series

– Code online (www.humansensing.cs.cmu.edu)– Open problems: automatic selection of number of clusters

• Weakly-supervised discovery of events in time-series– DS-SVM – Novel & efficient algorithm for time series– Outperform methods with human labeled data

• Kernel methods a fundamental framework for multimodal data fusion.


Thanks

Questions?

Documents

ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre