51
ACA DS-SVM Conclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

Embed Size (px)

Citation preview

Page 1: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

Unsupervised and weakly-supervised

discovery of events in video(and audio)

Fernando De la Torre

Page 2: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

A dream

Page 3: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

Outline

• Introduction• CMU-Multimodal Activity database• Unsupervised discovery of video events

• Aligned Cluster Analysis (ACA)• Weakly-supervised discovery of video events

• Detection-Segmentation SVMs• Conclusions

Page 4: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

Quality of life technologies (QLoT)

Page 5: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

Multimodal data collection• 40 subjects, 5 recipes• www.kitchen.cs.cmu.edu

Page 6: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

Multimodal data collection• 40 subjects, 5 recipes• www.kitchen.cs.cmu.edu

Page 7: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

Anomalous dataset

Page 8: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

Time series analysis

• Anomalous detection formulated as detecting outliers in multimodal time series.– Supervised– Unsupervised– Semi-supervised or weakly supervised

Page 9: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

Time series analysis

• Anomalous detection formulated as detecting outliers in multimodal time series.– Supervised– Unsupervised– Semi-supervised or weakly supervised

Page 10: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

Unsupervised discovery ofevents in video

Page 11: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

Motivation• Mining facial expression for one subject

Page 12: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

• Mining facial expression for one subject

Motivation• Mining facial expression for one subject

• Summarization

• Visualization

• Indexing

Page 13: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

• Mining facial expression for one subject

Looking up Sleeping SmilingLooking forwardWaking up

Motivation

• Summarization

• Visualization

• Indexing

Page 14: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

• Mining facial expression of one subject

Motivation

• Summarization

• Embedding

• Indexing

Page 15: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

• Mining facial expression for one subject

Motivation

• Summarization

• Embedding

• Indexing

Page 16: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

Related work in time series

• Change point detection (e.g. Page ‘54, Stephens 94’, Lai ‘95, Ge and Smyth ‘00, Steyvers & Brown ’05, Murphy et al. ‘07, Harchaoui et al. ‘08)

• Segmental HMMs (e.g. Ge and Smith ‘00, Kohlmoren et al. ’01, Ding & Fan ‘07)

• Mixtures of HMMs (e.g. Fine et al. ‘98, Murphy & Paskin ‘01, Oliver et al. ’02, Alon et al. ‘03)

• Switching LDS (e.g. Pavolvic et al. ‘00, Oh et al. ‘08, Turaga et al. ‘09)

• Hierarchical Dirichelet Process (e.g. Beal et al. ‘02, Fox et al. ‘08)

• Aligned Cluster Analysis (ACA)

Page 17: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

Summarization with ACA

Page 18: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

MG xy

Kernel k-means and spectral clustering(Ding et al. ‘02, Dhillon et al. ‘04, Zass and Shashua ‘05, De la Torre ‘06)

2||||),( FJ MGXGM

1

2

3

4

5

6

7

8

9

10x

y

G

xyX

)))((()( 1n GGGGIKG TTtrJ

)(

)()( XXK T

M xy

G

Page 19: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

2)(),,(

FacaJ MGXGM

Problem formulation for ACA

H )..[)..[)..[ 13221,...,,

mm hhhhhh XXX

)..[ 21 hhX )..[ 32 hhX )..[ 1mm hhX

Labels (G)3h

Start and end of the segments (h)mh 1mh

Dynamic Time Alignment Kernel (Shimodaira et al. 01)

1h 2h 4h

Page 20: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

k

ccSS

m

ici mg

ii1

2

2)..[1

1X

Dynamic Time Alignment Kernel (Shimodaira et al. 01)

X [Si , Si+1) mc

X [Si , Si+1)

mc

2

)..[)..[)..[ ),...,,(),,(13221 Fssssssaca mm

J MGXXXSGM

Problem formulation for ACA

Page 21: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

Matrix formulation for ACA

GGGGILKL 1n )(with)( TT

kmk trJ

samples

segm

ents

2371,0 H

GHGGGHILWLK 1n )(with))o(( TTT

aca trJ

2323RW

clus

ters

segments

731,0 G

)()( XXK T

Dynamic Time Alignment Kernel (Shimodaira et al. 01)

23 frames, 3 clusters

Page 22: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

Facial image features

Appearance

• Active Appearance Models (Baker and Matthews ‘04)

Upper face

Lower face

Shape• Image features

Page 23: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

Unsupervised facial event discovery

Page 24: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

• Cohn-Kanade: 30 people and five different expressions (surprise, joy, sadness, fear, anger)

Facial event discovery across subjects

Page 25: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

ACA Spectral Clustering (SC)

0.87(.05) 0.56(.04)

• Cohn-Kanade: 30 people and five different expressions (surprise, joy, sadness, fear, anger)

Facial event discovery across subjects

• 10 sets of 30 people

Page 26: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

Honey bee dance(Oh et al. ‘08)

Seq 1 Seq 2 Seq 3 Seq 4 Seq 5 Seq 6

ACA 0.845 0.925 0.600 0.922 0.878 0.928

PS- SLDS (Oh et al. ‘08) 0.759 0.924 0.831 0.934 0.904 0.910

HDP- VAR(1)-HMM (Fox et al. ‘08)

0.465 0.441 0.456 0.832 0.932 0.887

Spectral Clustering 0.698 0.631 0.509 0.671 0.577 0.649

Three behaviors: 1-waggling2-turning left3-turning right

Page 27: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

Clustering human motion

Page 28: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

Weakly supervised discoveryof events in images and video

Page 29: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

Spot the differences!

Page 30: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

What distinguish these images?

Page 31: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

Classification of time series

Page 32: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

Similarity of these problems?

• Global statistics are not distinctive enough!• Better understanding of the discriminative regions or events

Page 33: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

ImageImage Bag of ‘regions’Bag of ‘regions’

At least one positive

All negative

Page 34: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

Support vector machines (SVMs)

2

2

1

w

Page 35: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

Learning formulation• Standard SVM

-3-2

-1

-10.5

3

(Andrews et. al. ’03, Felzenszwalb et al. ‘08)

Page 36: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

Optimization

all possible subwindows 100ms/image (480*640 pixels)(Lampert et al. CVPR08)

1)

2)

0.5

0.1

3) SVM with QP

-3-2

-1 1

2

Page 37: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

Discriminative patterns in time series

At most k disjoint intervals

We name it:k-segmentation

• Efficient search: Global optimum guaranteed!

10ms/sequence (15000 frames)

Page 38: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

Representation of signals

Training data

Compute frame-levelfeature vectors

IDs of visual words

Visual dictionary

Visual dictionary

clustering

5,10,97,...,9,42,10,91

Page 39: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

K-segmentation

Original signal

IDs of visual words

Histogram of visual words

We need:

40,13,10,5,10,97,...,9,42,10,91

Page 40: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

What is ?

SVM parameters

Original signal (x)

IDs of visual words 40,13,10,5,10,97,...,9,42,10,91

x

xwi

iT w)(

401310510979421091 ,,,,,,...,,,, wwwwwwwwww

m-segmentation (m+1)-segmentation

Consider m-segmentation:

Situation 1:

Situation 2:

Page 41: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

Experiment 1 – glasses vs. no-glasses• 624 images, 20 people under different expression/pose• 8 people training (126 sunglasses, 128 no glasses), 12 testing (185 sunglasses and 185 no glasses)

Page 42: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

Localization result

Page 43: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

Experiment 2 – car vs. no car• 400 images, half contains cars and other half no cars. • Each image 10,000 SIFT descriptors and a vocabulary of 1,000 visual words.

Page 44: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

Localization result

Page 45: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

Bad localization cases

Page 46: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

Classification performance

Human labelsOur method outperforms SVM with human labels!!!

whole image discriminative regions

Page 47: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

Experiment 3 – synthetic data

Positive class

Negative class

Result

k: maximum number of disjoint intervals.

Accu

racy

Page 48: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

Experiment 4 – mouse activity

• Mouse activities:– Drinking, eating, exploring, grooming, sleeping

Page 49: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

Result – F1 scores

Page 50: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

Conclusions• CMU Multimodal Activity database• Unsupervised discovery of events in time-series

– Aligned Cluster Analysis for summarization, indexing and visualization of time-series

– Code online (www.humansensing.cs.cmu.edu)– Open problems: automatic selection of number of clusters

• Weakly-supervised discovery of events in time-series– DS-SVM – Novel & efficient algorithm for time series– Outperform methods with human labeled data

• Kernel methods a fundamental framework for multimodal data fusion.

Page 51: ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre

ACA DS-SVM ConclusionsIntroduction CMU-MMAC

Thanks

Questions?