41
Unsupervised Framework for Interactions Modeling between Multiple Objects Ali Al-Raziqi, Joachim Denzler Computer Vision Group Department of Mathematics and Computer Science Friedrich Schiller University of Jena, Germany {Ali.Al-Raziqi,Joachim.Denzler}@uni-jena.de http://www.inf-cv.uni-jena.de/ March 4, 2016

Ali Al-Raziqi- Unsupervised Framework for Interactions Modeling between Multiple Objects

Embed Size (px)

Citation preview

Unsupervised Framework forInteractions Modeling between

Multiple Objects

Ali Al-Raziqi, Joachim Denzler

Computer Vision GroupDepartment of Mathematics and Computer Science

Friedrich Schiller University of Jena, Germany

Ali.Al-Raziqi,[email protected]://www.inf-cv.uni-jena.de/

March 4, 2016

IntroductionInteraction Modeling

Experiments and ResultsConclusion

Friedrich Schiller University Jena

Computer Vision Group

Outline

1 Introduction

2 Interaction Modeling

3 Experiments and Results

4 Conclusion

Ali Al-Raziqi, Joachim Denzler Interactions Modeling 1 of 23

IntroductionInteraction Modeling

Experiments and ResultsConclusion

Friedrich Schiller University Jena

Computer Vision Group

Outline

1 Introduction

2 Interaction Modeling

3 Experiments and Results

4 Conclusion

Ali Al-Raziqi, Joachim Denzler Interactions Modeling 2 of 23

IntroductionInteraction Modeling

Experiments and ResultsConclusion

Friedrich Schiller University Jena

Computer Vision Group

Introduction

Activity recognition

Activities Datasets:[Gorelick,2007, Ryoo,2010, Blunsden,Scott,et al. 2010]

Ali Al-Raziqi, Joachim Denzler Interactions Modeling 3 of 23

IntroductionInteraction Modeling

Experiments and ResultsConclusion

Friedrich Schiller University Jena

Computer Vision Group

Motivation

Motivation

Our goal is to build an unsupervised system to extract theinteraction between objects in video sequence.

Current object interactions modeling systems mostly rely onsupervised learning methods.

Ali Al-Raziqi, Joachim Denzler Interactions Modeling 4 of 23

IntroductionInteraction Modeling

Experiments and ResultsConclusion

Friedrich Schiller University Jena

Computer Vision Group

Motivation

Motivation

Our goal is to build an unsupervised system to extract theinteraction between objects in video sequence.Current object interactions modeling systems mostly rely onsupervised learning methods.

Ali Al-Raziqi, Joachim Denzler Interactions Modeling 4 of 23

IntroductionInteraction Modeling

Experiments and ResultsConclusion

Friedrich Schiller University Jena

Computer Vision Group

Interactions Samples (InGroup and Fight)

Ali Al-Raziqi, Joachim Denzler Interactions Modeling 5 of 23

IntroductionInteraction Modeling

Experiments and ResultsConclusion

Friedrich Schiller University Jena

Computer Vision Group

Outline

1 Introduction

2 Interaction Modeling

3 Experiments and Results

4 Conclusion

Ali Al-Raziqi, Joachim Denzler Interactions Modeling 6 of 23

IntroductionInteraction Modeling

Experiments and ResultsConclusion

Friedrich Schiller University Jena

Computer Vision Group

Sequence Tracking

Tracking

Track all cavies by tracking-by-detection method, where cavies arefirstly detected in each frame.These detections associated in successive frames using two-stagesgraph tracking approach using [Jiang, Xiaoyan, et al., 2012].

Ali Al-Raziqi, Joachim Denzler Interactions Modeling 7 of 23

IntroductionInteraction Modeling

Experiments and ResultsConclusion

Friedrich Schiller University Jena

Computer Vision Group

Sequence Tracking Flow WordsExtraction

Dictionary

Optical Flow

The tracking algorithm is represented as bounding boxes.Optical flow inside the BBs regions is computed using the TV-L1

algorithm [Zach, Christopher, et al., 2007].One flow word is: w = (xi, yi, ui, vi), quantized into eight directions.

Ali Al-Raziqi, Joachim Denzler Interactions Modeling 7 of 23

IntroductionInteraction Modeling

Experiments and ResultsConclusion

Friedrich Schiller University Jena

Computer Vision Group

Sequence Tracking Flow WordsExtraction

Dictionary

Bag-of-WordsClips

Flow Word Count

.....2540 3

24 12

28

.....

3568

Flow Word Count

.....8560 203

985 102

2

.....

15840

Clips

Divided the videos into clips with equal sizeEach clip represented by its words.

Ali Al-Raziqi, Joachim Denzler Interactions Modeling 7 of 23

IntroductionInteraction Modeling

Experiments and ResultsConclusion

Friedrich Schiller University Jena

Computer Vision Group

Sequence Tracking Flow WordsExtraction

Dictionary

HDP Model

InteractionsBag-of-WordsClips

Flow Word Count

.....2540 3

24 12

28

.....

3568

Flow Word Count

.....8560 203

985 102

2

.....

15840

Ali Al-Raziqi, Joachim Denzler Interactions Modeling 7 of 23

IntroductionInteraction Modeling

Experiments and ResultsConclusion

Friedrich Schiller University Jena

Computer Vision Group

Why topic models?

Assumption

Suppose you have a huge number ofdocumentsWant to know what’s going onCan’t read them all (e.g. every NewYork Times article from the 90’s)Topic models offer a way to get acorpus-level view of major themes

Unsupervised

Some slides are taken from JordanBoyd-Graber with permission

Ali Al-Raziqi, Joachim Denzler Interactions Modeling 8 of 23

IntroductionInteraction Modeling

Experiments and ResultsConclusion

Friedrich Schiller University Jena

Computer Vision Group

Why topic models?

Assumption

Suppose you have a huge number ofdocumentsWant to know what’s going onCan’t read them all (e.g. every NewYork Times article from the 90’s)Topic models offer a way to get acorpus-level view of major themesUnsupervised

Some slides are taken from JordanBoyd-Graber with permission

Ali Al-Raziqi, Joachim Denzler Interactions Modeling 8 of 23

IntroductionInteraction Modeling

Experiments and ResultsConclusion

Friedrich Schiller University Jena

Computer Vision Group

Conceptual Approach

From an input corpus and number of topics K → words to topics

Forget the Bootleg, Just Download the Movie LegallyMultiplex Heralded As

Linchpin To GrowthThe Shape of Cinema, Transformed At the Click of

a MouseA Peaceful Crew Puts

Muppets Where Its Mouth IsStock Trades: A Better Deal For Investors Isn't SimpleThe three big Internet portals begin to distinguish

among themselves as shopping malls

Red Light, Green Light: A 2-Tone L.E.D. to Simplify Screens

Corpus

Ali Al-Raziqi, Joachim Denzler Interactions Modeling 9 of 23

IntroductionInteraction Modeling

Experiments and ResultsConclusion

Friedrich Schiller University Jena

Computer Vision Group

Conceptual Approach

From an input corpus and number of topics K → words to topics

computer, technology,

system, service, site,

phone, internet, machine

play, film, movie, theater,

production, star, director,

stage

sell, sale, store, product,

business, advertising,

market, consumer

TOPIC 1 TOPIC 2 TOPIC 3

Ali Al-Raziqi, Joachim Denzler Interactions Modeling 9 of 23

IntroductionInteraction Modeling

Experiments and ResultsConclusion

Friedrich Schiller University Jena

Computer Vision Group

Generative Model

Hollywood studios are preparing to let people

download and buy electronic copies of movies over

the Internet, much as record labels now sell songs for

99 cents through Apple Computer's iTunes music store

and other online services ...

computer, technology,

system, service, site,

phone, internet, machine

play, film, movie, theater,

production, star, director,

stage

sell, sale, store, product,

business, advertising,

market, consumer

Ali Al-Raziqi, Joachim Denzler Interactions Modeling 10 of 23

IntroductionInteraction Modeling

Experiments and ResultsConclusion

Friedrich Schiller University Jena

Computer Vision Group

Generative Model

Hollywood studios are preparing to let people

download and buy electronic copies of movies over

the Internet, much as record labels now sell songs for

99 cents through Apple Computer's iTunes music store

and other online services ...

computer, technology,

system, service, site,

phone, internet, machine

play, film, movie, theater,

production, star, director,

stage

sell, sale, store, product,

business, advertising,

market, consumer

Ali Al-Raziqi, Joachim Denzler Interactions Modeling 10 of 23

IntroductionInteraction Modeling

Experiments and ResultsConclusion

Friedrich Schiller University Jena

Computer Vision Group

Generative Model

Hollywood studios are preparing to let people

download and buy electronic copies of movies over

the Internet, much as record labels now sell songs for

99 cents through Apple Computer's iTunes music store

and other online services ...

computer, technology,

system, service, site,

phone, internet, machine

play, film, movie, theater,

production, star, director,

stage

sell, sale, store, product,

business, advertising,

market, consumer

Ali Al-Raziqi, Joachim Denzler Interactions Modeling 10 of 23

IntroductionInteraction Modeling

Experiments and ResultsConclusion

Friedrich Schiller University Jena

Computer Vision Group

Generative Model

Hollywood studios are preparing to let people

download and buy electronic copies of movies over

the Internet, much as record labels now sell songs for

99 cents through Apple Computer's iTunes music store

and other online services ...

computer, technology,

system, service, site,

phone, internet, machine

play, film, movie, theater,

production, star, director,

stage

sell, sale, store, product,

business, advertising,

market, consumer

Ali Al-Raziqi, Joachim Denzler Interactions Modeling 10 of 23

IntroductionInteraction Modeling

Experiments and ResultsConclusion

Friedrich Schiller University Jena

Computer Vision Group

Hierarchical Dirichlet Process (HDP)

HDP has been originally designed for clustering words in documentsbased on word co-occurrences not distances in feature-space[Teh, Yee Whye,2006].The number of clusters is deduced automatically from the data andhyper-parameters.

Ali Al-Raziqi, Joachim Denzler Interactions Modeling 11 of 23

IntroductionInteraction Modeling

Experiments and ResultsConclusion

Friedrich Schiller University Jena

Computer Vision Group

HDP

MNθd zn wn

Kβk

α

λ

Infereance Topics

For each topic k ∈ 1, . . . , ∞, draw a multinomial distribution βk from aDirichlet distribution.

Ali Al-Raziqi, Joachim Denzler Interactions Modeling 12 of 23

IntroductionInteraction Modeling

Experiments and ResultsConclusion

Friedrich Schiller University Jena

Computer Vision Group

HDP

MNθd zn wn

Kβk

α

λ

GenerativeFor each document d ∈ 1, . . . , M, draw a multinomial distribution θdfrom a Dirichlet distribution with parameter α.

Ali Al-Raziqi, Joachim Denzler Interactions Modeling 12 of 23

IntroductionInteraction Modeling

Experiments and ResultsConclusion

Friedrich Schiller University Jena

Computer Vision Group

HDP

MNθd zn wn

Kβk

α

λ

GenerativeFor each word position n ∈ 1, . . . , N, select a hidden topic zn from themultinomial distribution parameterized by θ.

Ali Al-Raziqi, Joachim Denzler Interactions Modeling 12 of 23

IntroductionInteraction Modeling

Experiments and ResultsConclusion

Friedrich Schiller University Jena

Computer Vision Group

HDP

MNθd zn wn

Kβk

α

λ

GenerativeChoose the observed word wnfrom the distribution βzn .

Ali Al-Raziqi, Joachim Denzler Interactions Modeling 12 of 23

IntroductionInteraction Modeling

Experiments and ResultsConclusion

Friedrich Schiller University Jena

Computer Vision Group

Outline

1 Introduction

2 Interaction Modeling

3 Experiments and Results

4 Conclusion

Ali Al-Raziqi, Joachim Denzler Interactions Modeling 13 of 23

IntroductionInteraction Modeling

Experiments and ResultsConclusion

Friedrich Schiller University Jena

Computer Vision Group

Experiments and Results

We performed several experiments on the Cavy dataset and thebenchmark dataset Behave [Blunsden,Scott,et al. 2010].As the Cavy dataset does not contain ground truth, we marked thesemantically meaningful interactions in the scene.Then, similar to the procedures in [Kuettel,2010, Krishna,2014], theoutput of our system is manually mapped to the ground truth labelsand the performance accuracy is calculated.

Ali Al-Raziqi, Joachim Denzler Interactions Modeling 14 of 23

IntroductionInteraction Modeling

Experiments and ResultsConclusion

Friedrich Schiller University Jena

Computer Vision Group

Behave Dataset

Behave dataset consists of fourvideo sequences, and 76, 800frames in total.Recorded at 25 frames persecond with a resolution of640× 480 pixels.The number of objects involvedin the interaction is rangingfrom 2 to 5.The tracking ground truth isavailable but not for the wholedataset.

Ali Al-Raziqi, Joachim Denzler Interactions Modeling 15 of 23

IntroductionInteraction Modeling

Experiments and ResultsConclusion

Friedrich Schiller University Jena

Computer Vision Group

Comparison

Interaction recognition comparison with [Kim,2014] and [Munch,2012]

Category Our [Kim,2014] [Munch,2012]Approach 68.42 83.33 60.00

Split 66.42 100.00 70.00WalkTogether 75.00 91.66 45.00

InGroup 53.73 100.00 90.00Average 65.95 93.74 66.25

Ali Al-Raziqi, Joachim Denzler Interactions Modeling 16 of 23

IntroductionInteraction Modeling

Experiments and ResultsConclusion

Friedrich Schiller University Jena

Computer Vision Group

Comparison

Interaction recognition comparison with [Kim,2014] and [Yin,2012]

Category Our [Kim,2014] [Yin,2012]Split 66.42 100.00 93.10

WalkTogether 75.00 91.66 92.10InGroup 53.73 100.00 94.30Fight 80.00 83.33 95.10

Average 65.95 93.74 93.65

Ali Al-Raziqi, Joachim Denzler Interactions Modeling 16 of 23

IntroductionInteraction Modeling

Experiments and ResultsConclusion

Friedrich Schiller University Jena

Computer Vision Group

Cavy Dataset

Sequences are recorded fromdifferent views with changingillumination and in differentperiods.It contains 16 sequences with640× 480 resolutions recordedat 7.5 frames per second (fps)with approx 3 million frames intotal (272 GB).

Contains five dominantinteractions performed severaltimes by two or three cavies.

Ali Al-Raziqi, Joachim Denzler Interactions Modeling 17 of 23

IntroductionInteraction Modeling

Experiments and ResultsConclusion

Friedrich Schiller University Jena

Computer Vision Group

Interaction Description

Approach One object approaches toanother(s) object(s)

Ingroup Several objects are close to eachother and with small motion

Fight Objects fighting each otherSplit Object(s) split from one anotherFollow Object(s) following other

Cavy Dataset

Sequences are recorded fromdifferent views with changingillumination and in differentperiods.It contains 16 sequences with640× 480 resolutions recordedat 7.5 frames per second (fps)with approx 3 million frames intotal (272 GB).Contains five dominantinteractions performed severaltimes by two or three cavies.

Ali Al-Raziqi, Joachim Denzler Interactions Modeling 17 of 23

IntroductionInteraction Modeling

Experiments and ResultsConclusion

Friedrich Schiller University Jena

Computer Vision Group

Confusion Matrix

Approach Split InGroup Follow Fight NoIntApproach 0.51 0.03 0.05 0.00 0.00 0.41

Split 0.01 0.28 0.03 0.00 0.01 0.67InGroup 0.03 0.01 0.40 0.00 0.02 0.54Follow 0.00 0.25 0.13 0.50 0.00 0.13Fight 0.02 0.00 0.10 0.00 0.35 0.53NoInt 0.06 0.01 0.14 0.01 0.05 0.73

#6175

3738

48392

Ali Al-Raziqi, Joachim Denzler Interactions Modeling 18 of 23

IntroductionInteraction Modeling

Experiments and ResultsConclusion

Friedrich Schiller University Jena

Computer Vision Group

Analysis

Different factors that have an effect on the results, such as errors raisedfrom detector (splitted objects,false, missing, merged)Optical flow for fixed objects.

Split False Missing Merge

Ali Al-Raziqi, Joachim Denzler Interactions Modeling 19 of 23

IntroductionInteraction Modeling

Experiments and ResultsConclusion

Friedrich Schiller University Jena

Computer Vision Group

Conclusion

Conclusion

Our proposed approach incorporates an unsupervised clusteringcapabilities of the HDP with spatio-temporal features.Furthermore, the Cavy dataset is introduced in this work.The experiments have been performed on the Cavy dataset and theBehave dataset.Our approach achieved results with an accuracy of up to 65.95% onthe Behave dataset and up to 45% on Cavy dataset.

Ali Al-Raziqi, Joachim Denzler Interactions Modeling 20 of 23

IntroductionInteraction Modeling

Experiments and ResultsConclusion

Friedrich Schiller University Jena

Computer Vision Group

Conclusion

Improvement

Robust Detector and Tracker.Appearance-based Features (SIFT,HOG and CNN)Trajectory-based Features (Velocity, distanc).

Ali Al-Raziqi, Joachim Denzler Interactions Modeling 20 of 23

IntroductionInteraction Modeling

Experiments and ResultsConclusion

Friedrich Schiller University Jena

Computer Vision Group

Thank you for your attention!

Ali Al-Raziqi, Joachim Denzler Interactions Modeling 21 of 23

IntroductionInteraction Modeling

Experiments and ResultsConclusion

Friedrich Schiller University Jena

Computer Vision Group

The Cavy dataset and annotated interactions are available athttp://www.inf-cv.uni-jena.de/interaction_recognition.html

Ali Al-Raziqi, Joachim Denzler Interactions Modeling 21 of 23

ReferencesFriedrich Schiller University Jena

Computer Vision Group

Effects of hyper-parameter η on number of extracted interactions

0.1 0.5 1 1.5 2

10

20

30

40

Hyper-parameter η

#of

inte

ract

ions η

Ali Al-Raziqi, Joachim Denzler Interactions Modeling 22

ReferencesFriedrich Schiller University Jena

Computer Vision Group

Effects of hyper-parameter η on the Accuracy

0 0.5 1 1.5 20.5

0.6

0.7

Hyper-parameter η

Acc

urac

y

Ali Al-Raziqi, Joachim Denzler Interactions Modeling 22

ReferencesFriedrich Schiller University Jena

Computer Vision Group

ReferencesI Jiang, Xiaoyan and Rodner, Erik and Denzler, Joachim

Multi-person tracking-by-detection based on calibrated multi-camera systemsComputer Vision and Graphics

I Zach, Christopher and Pock, Thomas and Bischof, HorstA duality based approach for realtime TV-L 1 optical flowPattern Recognition

I Blunsden, Scott and Fisher, RBThe BEHAVE video dataset: ground truthed video for multi-person behavior classificationBritish Machine Vision Association

I Kim, Young-Ji and Cho, Nam-Gyu and Lee, Seong-WhanGroup Activity Recognition with Group Interaction ZoneICPR

I Munch, David and Michaelsen, Eckart and Arens, MichaelSupporting fuzzy metric temporal logic based situation recognition by mean shift clusteringAdvances in Artificial Intelligence

I Yin, Yafeng and Yang, Guang and Xu, Jin and Man, HongSmall group human activity recognitionICIP

I Kuettel, Daniel and Breitenstein, Michael D and Van Gool, Luc and Ferrari, VittorioWhat’s going on? Discovering spatio-temporal dependencies in dynamic scenesCVPR

I Mahesh Krishna and Joachim DenzlerA Combination of Generative and Discriminative Models for Fast Unsupervised ActivityRecognition from Traffic Scene VideosProceedings of the IEEE (WACV)

I Teh, Yee Whye and Jordan, Michael I and Beal, Matthew J and Blei, David MHierarchical dirichlet processesJournal of the american statistical association

I Lena Gorelick and Moshe Blank and Eli Shechtman and Michal Irani and Ronen BasriActions as Space-Time ShapesTransactions on Pattern Analysis and Machine Intelligence

I Ryoo, M. S. and Aggarwal, J. KUT-Interaction Dataset, ICPR contest on Semantic Description of Human Activities (SDHA)ICPR

Ali Al-Raziqi, Joachim Denzler Interactions Modeling 23