Upload
ali-alraziqi
View
89
Download
0
Embed Size (px)
Citation preview
Unsupervised Framework forInteractions Modeling between
Multiple Objects
Ali Al-Raziqi, Joachim Denzler
Computer Vision GroupDepartment of Mathematics and Computer Science
Friedrich Schiller University of Jena, Germany
Ali.Al-Raziqi,[email protected]://www.inf-cv.uni-jena.de/
March 4, 2016
IntroductionInteraction Modeling
Experiments and ResultsConclusion
Friedrich Schiller University Jena
Computer Vision Group
Outline
1 Introduction
2 Interaction Modeling
3 Experiments and Results
4 Conclusion
Ali Al-Raziqi, Joachim Denzler Interactions Modeling 1 of 23
IntroductionInteraction Modeling
Experiments and ResultsConclusion
Friedrich Schiller University Jena
Computer Vision Group
Outline
1 Introduction
2 Interaction Modeling
3 Experiments and Results
4 Conclusion
Ali Al-Raziqi, Joachim Denzler Interactions Modeling 2 of 23
IntroductionInteraction Modeling
Experiments and ResultsConclusion
Friedrich Schiller University Jena
Computer Vision Group
Introduction
Activity recognition
Activities Datasets:[Gorelick,2007, Ryoo,2010, Blunsden,Scott,et al. 2010]
Ali Al-Raziqi, Joachim Denzler Interactions Modeling 3 of 23
IntroductionInteraction Modeling
Experiments and ResultsConclusion
Friedrich Schiller University Jena
Computer Vision Group
Motivation
Motivation
Our goal is to build an unsupervised system to extract theinteraction between objects in video sequence.
Current object interactions modeling systems mostly rely onsupervised learning methods.
Ali Al-Raziqi, Joachim Denzler Interactions Modeling 4 of 23
IntroductionInteraction Modeling
Experiments and ResultsConclusion
Friedrich Schiller University Jena
Computer Vision Group
Motivation
Motivation
Our goal is to build an unsupervised system to extract theinteraction between objects in video sequence.Current object interactions modeling systems mostly rely onsupervised learning methods.
Ali Al-Raziqi, Joachim Denzler Interactions Modeling 4 of 23
IntroductionInteraction Modeling
Experiments and ResultsConclusion
Friedrich Schiller University Jena
Computer Vision Group
Interactions Samples (InGroup and Fight)
Ali Al-Raziqi, Joachim Denzler Interactions Modeling 5 of 23
IntroductionInteraction Modeling
Experiments and ResultsConclusion
Friedrich Schiller University Jena
Computer Vision Group
Outline
1 Introduction
2 Interaction Modeling
3 Experiments and Results
4 Conclusion
Ali Al-Raziqi, Joachim Denzler Interactions Modeling 6 of 23
IntroductionInteraction Modeling
Experiments and ResultsConclusion
Friedrich Schiller University Jena
Computer Vision Group
Sequence Tracking
Tracking
Track all cavies by tracking-by-detection method, where cavies arefirstly detected in each frame.These detections associated in successive frames using two-stagesgraph tracking approach using [Jiang, Xiaoyan, et al., 2012].
Ali Al-Raziqi, Joachim Denzler Interactions Modeling 7 of 23
IntroductionInteraction Modeling
Experiments and ResultsConclusion
Friedrich Schiller University Jena
Computer Vision Group
Sequence Tracking Flow WordsExtraction
Dictionary
Optical Flow
The tracking algorithm is represented as bounding boxes.Optical flow inside the BBs regions is computed using the TV-L1
algorithm [Zach, Christopher, et al., 2007].One flow word is: w = (xi, yi, ui, vi), quantized into eight directions.
Ali Al-Raziqi, Joachim Denzler Interactions Modeling 7 of 23
IntroductionInteraction Modeling
Experiments and ResultsConclusion
Friedrich Schiller University Jena
Computer Vision Group
Sequence Tracking Flow WordsExtraction
Dictionary
Bag-of-WordsClips
Flow Word Count
.....2540 3
24 12
28
.....
3568
Flow Word Count
.....8560 203
985 102
2
.....
15840
Clips
Divided the videos into clips with equal sizeEach clip represented by its words.
Ali Al-Raziqi, Joachim Denzler Interactions Modeling 7 of 23
IntroductionInteraction Modeling
Experiments and ResultsConclusion
Friedrich Schiller University Jena
Computer Vision Group
Sequence Tracking Flow WordsExtraction
Dictionary
HDP Model
InteractionsBag-of-WordsClips
Flow Word Count
.....2540 3
24 12
28
.....
3568
Flow Word Count
.....8560 203
985 102
2
.....
15840
Ali Al-Raziqi, Joachim Denzler Interactions Modeling 7 of 23
IntroductionInteraction Modeling
Experiments and ResultsConclusion
Friedrich Schiller University Jena
Computer Vision Group
Why topic models?
Assumption
Suppose you have a huge number ofdocumentsWant to know what’s going onCan’t read them all (e.g. every NewYork Times article from the 90’s)Topic models offer a way to get acorpus-level view of major themes
Unsupervised
Some slides are taken from JordanBoyd-Graber with permission
Ali Al-Raziqi, Joachim Denzler Interactions Modeling 8 of 23
IntroductionInteraction Modeling
Experiments and ResultsConclusion
Friedrich Schiller University Jena
Computer Vision Group
Why topic models?
Assumption
Suppose you have a huge number ofdocumentsWant to know what’s going onCan’t read them all (e.g. every NewYork Times article from the 90’s)Topic models offer a way to get acorpus-level view of major themesUnsupervised
Some slides are taken from JordanBoyd-Graber with permission
Ali Al-Raziqi, Joachim Denzler Interactions Modeling 8 of 23
IntroductionInteraction Modeling
Experiments and ResultsConclusion
Friedrich Schiller University Jena
Computer Vision Group
Conceptual Approach
From an input corpus and number of topics K → words to topics
Forget the Bootleg, Just Download the Movie LegallyMultiplex Heralded As
Linchpin To GrowthThe Shape of Cinema, Transformed At the Click of
a MouseA Peaceful Crew Puts
Muppets Where Its Mouth IsStock Trades: A Better Deal For Investors Isn't SimpleThe three big Internet portals begin to distinguish
among themselves as shopping malls
Red Light, Green Light: A 2-Tone L.E.D. to Simplify Screens
Corpus
Ali Al-Raziqi, Joachim Denzler Interactions Modeling 9 of 23
IntroductionInteraction Modeling
Experiments and ResultsConclusion
Friedrich Schiller University Jena
Computer Vision Group
Conceptual Approach
From an input corpus and number of topics K → words to topics
computer, technology,
system, service, site,
phone, internet, machine
play, film, movie, theater,
production, star, director,
stage
sell, sale, store, product,
business, advertising,
market, consumer
TOPIC 1 TOPIC 2 TOPIC 3
Ali Al-Raziqi, Joachim Denzler Interactions Modeling 9 of 23
IntroductionInteraction Modeling
Experiments and ResultsConclusion
Friedrich Schiller University Jena
Computer Vision Group
Generative Model
Hollywood studios are preparing to let people
download and buy electronic copies of movies over
the Internet, much as record labels now sell songs for
99 cents through Apple Computer's iTunes music store
and other online services ...
computer, technology,
system, service, site,
phone, internet, machine
play, film, movie, theater,
production, star, director,
stage
sell, sale, store, product,
business, advertising,
market, consumer
Ali Al-Raziqi, Joachim Denzler Interactions Modeling 10 of 23
IntroductionInteraction Modeling
Experiments and ResultsConclusion
Friedrich Schiller University Jena
Computer Vision Group
Generative Model
Hollywood studios are preparing to let people
download and buy electronic copies of movies over
the Internet, much as record labels now sell songs for
99 cents through Apple Computer's iTunes music store
and other online services ...
computer, technology,
system, service, site,
phone, internet, machine
play, film, movie, theater,
production, star, director,
stage
sell, sale, store, product,
business, advertising,
market, consumer
Ali Al-Raziqi, Joachim Denzler Interactions Modeling 10 of 23
IntroductionInteraction Modeling
Experiments and ResultsConclusion
Friedrich Schiller University Jena
Computer Vision Group
Generative Model
Hollywood studios are preparing to let people
download and buy electronic copies of movies over
the Internet, much as record labels now sell songs for
99 cents through Apple Computer's iTunes music store
and other online services ...
computer, technology,
system, service, site,
phone, internet, machine
play, film, movie, theater,
production, star, director,
stage
sell, sale, store, product,
business, advertising,
market, consumer
Ali Al-Raziqi, Joachim Denzler Interactions Modeling 10 of 23
IntroductionInteraction Modeling
Experiments and ResultsConclusion
Friedrich Schiller University Jena
Computer Vision Group
Generative Model
Hollywood studios are preparing to let people
download and buy electronic copies of movies over
the Internet, much as record labels now sell songs for
99 cents through Apple Computer's iTunes music store
and other online services ...
computer, technology,
system, service, site,
phone, internet, machine
play, film, movie, theater,
production, star, director,
stage
sell, sale, store, product,
business, advertising,
market, consumer
Ali Al-Raziqi, Joachim Denzler Interactions Modeling 10 of 23
IntroductionInteraction Modeling
Experiments and ResultsConclusion
Friedrich Schiller University Jena
Computer Vision Group
Hierarchical Dirichlet Process (HDP)
HDP has been originally designed for clustering words in documentsbased on word co-occurrences not distances in feature-space[Teh, Yee Whye,2006].The number of clusters is deduced automatically from the data andhyper-parameters.
Ali Al-Raziqi, Joachim Denzler Interactions Modeling 11 of 23
IntroductionInteraction Modeling
Experiments and ResultsConclusion
Friedrich Schiller University Jena
Computer Vision Group
HDP
MNθd zn wn
Kβk
α
λ
Infereance Topics
For each topic k ∈ 1, . . . , ∞, draw a multinomial distribution βk from aDirichlet distribution.
Ali Al-Raziqi, Joachim Denzler Interactions Modeling 12 of 23
IntroductionInteraction Modeling
Experiments and ResultsConclusion
Friedrich Schiller University Jena
Computer Vision Group
HDP
MNθd zn wn
Kβk
α
λ
GenerativeFor each document d ∈ 1, . . . , M, draw a multinomial distribution θdfrom a Dirichlet distribution with parameter α.
Ali Al-Raziqi, Joachim Denzler Interactions Modeling 12 of 23
IntroductionInteraction Modeling
Experiments and ResultsConclusion
Friedrich Schiller University Jena
Computer Vision Group
HDP
MNθd zn wn
Kβk
α
λ
GenerativeFor each word position n ∈ 1, . . . , N, select a hidden topic zn from themultinomial distribution parameterized by θ.
Ali Al-Raziqi, Joachim Denzler Interactions Modeling 12 of 23
IntroductionInteraction Modeling
Experiments and ResultsConclusion
Friedrich Schiller University Jena
Computer Vision Group
HDP
MNθd zn wn
Kβk
α
λ
GenerativeChoose the observed word wnfrom the distribution βzn .
Ali Al-Raziqi, Joachim Denzler Interactions Modeling 12 of 23
IntroductionInteraction Modeling
Experiments and ResultsConclusion
Friedrich Schiller University Jena
Computer Vision Group
Outline
1 Introduction
2 Interaction Modeling
3 Experiments and Results
4 Conclusion
Ali Al-Raziqi, Joachim Denzler Interactions Modeling 13 of 23
IntroductionInteraction Modeling
Experiments and ResultsConclusion
Friedrich Schiller University Jena
Computer Vision Group
Experiments and Results
We performed several experiments on the Cavy dataset and thebenchmark dataset Behave [Blunsden,Scott,et al. 2010].As the Cavy dataset does not contain ground truth, we marked thesemantically meaningful interactions in the scene.Then, similar to the procedures in [Kuettel,2010, Krishna,2014], theoutput of our system is manually mapped to the ground truth labelsand the performance accuracy is calculated.
Ali Al-Raziqi, Joachim Denzler Interactions Modeling 14 of 23
IntroductionInteraction Modeling
Experiments and ResultsConclusion
Friedrich Schiller University Jena
Computer Vision Group
Behave Dataset
Behave dataset consists of fourvideo sequences, and 76, 800frames in total.Recorded at 25 frames persecond with a resolution of640× 480 pixels.The number of objects involvedin the interaction is rangingfrom 2 to 5.The tracking ground truth isavailable but not for the wholedataset.
Ali Al-Raziqi, Joachim Denzler Interactions Modeling 15 of 23
IntroductionInteraction Modeling
Experiments and ResultsConclusion
Friedrich Schiller University Jena
Computer Vision Group
Comparison
Interaction recognition comparison with [Kim,2014] and [Munch,2012]
Category Our [Kim,2014] [Munch,2012]Approach 68.42 83.33 60.00
Split 66.42 100.00 70.00WalkTogether 75.00 91.66 45.00
InGroup 53.73 100.00 90.00Average 65.95 93.74 66.25
Ali Al-Raziqi, Joachim Denzler Interactions Modeling 16 of 23
IntroductionInteraction Modeling
Experiments and ResultsConclusion
Friedrich Schiller University Jena
Computer Vision Group
Comparison
Interaction recognition comparison with [Kim,2014] and [Yin,2012]
Category Our [Kim,2014] [Yin,2012]Split 66.42 100.00 93.10
WalkTogether 75.00 91.66 92.10InGroup 53.73 100.00 94.30Fight 80.00 83.33 95.10
Average 65.95 93.74 93.65
Ali Al-Raziqi, Joachim Denzler Interactions Modeling 16 of 23
IntroductionInteraction Modeling
Experiments and ResultsConclusion
Friedrich Schiller University Jena
Computer Vision Group
Cavy Dataset
Sequences are recorded fromdifferent views with changingillumination and in differentperiods.It contains 16 sequences with640× 480 resolutions recordedat 7.5 frames per second (fps)with approx 3 million frames intotal (272 GB).
Contains five dominantinteractions performed severaltimes by two or three cavies.
Ali Al-Raziqi, Joachim Denzler Interactions Modeling 17 of 23
IntroductionInteraction Modeling
Experiments and ResultsConclusion
Friedrich Schiller University Jena
Computer Vision Group
Interaction Description
Approach One object approaches toanother(s) object(s)
Ingroup Several objects are close to eachother and with small motion
Fight Objects fighting each otherSplit Object(s) split from one anotherFollow Object(s) following other
Cavy Dataset
Sequences are recorded fromdifferent views with changingillumination and in differentperiods.It contains 16 sequences with640× 480 resolutions recordedat 7.5 frames per second (fps)with approx 3 million frames intotal (272 GB).Contains five dominantinteractions performed severaltimes by two or three cavies.
Ali Al-Raziqi, Joachim Denzler Interactions Modeling 17 of 23
IntroductionInteraction Modeling
Experiments and ResultsConclusion
Friedrich Schiller University Jena
Computer Vision Group
Confusion Matrix
Approach Split InGroup Follow Fight NoIntApproach 0.51 0.03 0.05 0.00 0.00 0.41
Split 0.01 0.28 0.03 0.00 0.01 0.67InGroup 0.03 0.01 0.40 0.00 0.02 0.54Follow 0.00 0.25 0.13 0.50 0.00 0.13Fight 0.02 0.00 0.10 0.00 0.35 0.53NoInt 0.06 0.01 0.14 0.01 0.05 0.73
#6175
3738
48392
Ali Al-Raziqi, Joachim Denzler Interactions Modeling 18 of 23
IntroductionInteraction Modeling
Experiments and ResultsConclusion
Friedrich Schiller University Jena
Computer Vision Group
Analysis
Different factors that have an effect on the results, such as errors raisedfrom detector (splitted objects,false, missing, merged)Optical flow for fixed objects.
Split False Missing Merge
Ali Al-Raziqi, Joachim Denzler Interactions Modeling 19 of 23
IntroductionInteraction Modeling
Experiments and ResultsConclusion
Friedrich Schiller University Jena
Computer Vision Group
Conclusion
Conclusion
Our proposed approach incorporates an unsupervised clusteringcapabilities of the HDP with spatio-temporal features.Furthermore, the Cavy dataset is introduced in this work.The experiments have been performed on the Cavy dataset and theBehave dataset.Our approach achieved results with an accuracy of up to 65.95% onthe Behave dataset and up to 45% on Cavy dataset.
Ali Al-Raziqi, Joachim Denzler Interactions Modeling 20 of 23
IntroductionInteraction Modeling
Experiments and ResultsConclusion
Friedrich Schiller University Jena
Computer Vision Group
Conclusion
Improvement
Robust Detector and Tracker.Appearance-based Features (SIFT,HOG and CNN)Trajectory-based Features (Velocity, distanc).
Ali Al-Raziqi, Joachim Denzler Interactions Modeling 20 of 23
IntroductionInteraction Modeling
Experiments and ResultsConclusion
Friedrich Schiller University Jena
Computer Vision Group
Thank you for your attention!
Ali Al-Raziqi, Joachim Denzler Interactions Modeling 21 of 23
IntroductionInteraction Modeling
Experiments and ResultsConclusion
Friedrich Schiller University Jena
Computer Vision Group
The Cavy dataset and annotated interactions are available athttp://www.inf-cv.uni-jena.de/interaction_recognition.html
Ali Al-Raziqi, Joachim Denzler Interactions Modeling 21 of 23
ReferencesFriedrich Schiller University Jena
Computer Vision Group
Effects of hyper-parameter η on number of extracted interactions
0.1 0.5 1 1.5 2
10
20
30
40
Hyper-parameter η
#of
inte
ract
ions η
Ali Al-Raziqi, Joachim Denzler Interactions Modeling 22
ReferencesFriedrich Schiller University Jena
Computer Vision Group
Effects of hyper-parameter η on the Accuracy
0 0.5 1 1.5 20.5
0.6
0.7
Hyper-parameter η
Acc
urac
y
Ali Al-Raziqi, Joachim Denzler Interactions Modeling 22
ReferencesFriedrich Schiller University Jena
Computer Vision Group
ReferencesI Jiang, Xiaoyan and Rodner, Erik and Denzler, Joachim
Multi-person tracking-by-detection based on calibrated multi-camera systemsComputer Vision and Graphics
I Zach, Christopher and Pock, Thomas and Bischof, HorstA duality based approach for realtime TV-L 1 optical flowPattern Recognition
I Blunsden, Scott and Fisher, RBThe BEHAVE video dataset: ground truthed video for multi-person behavior classificationBritish Machine Vision Association
I Kim, Young-Ji and Cho, Nam-Gyu and Lee, Seong-WhanGroup Activity Recognition with Group Interaction ZoneICPR
I Munch, David and Michaelsen, Eckart and Arens, MichaelSupporting fuzzy metric temporal logic based situation recognition by mean shift clusteringAdvances in Artificial Intelligence
I Yin, Yafeng and Yang, Guang and Xu, Jin and Man, HongSmall group human activity recognitionICIP
I Kuettel, Daniel and Breitenstein, Michael D and Van Gool, Luc and Ferrari, VittorioWhat’s going on? Discovering spatio-temporal dependencies in dynamic scenesCVPR
I Mahesh Krishna and Joachim DenzlerA Combination of Generative and Discriminative Models for Fast Unsupervised ActivityRecognition from Traffic Scene VideosProceedings of the IEEE (WACV)
I Teh, Yee Whye and Jordan, Michael I and Beal, Matthew J and Blei, David MHierarchical dirichlet processesJournal of the american statistical association
I Lena Gorelick and Moshe Blank and Eli Shechtman and Michal Irani and Ronen BasriActions as Space-Time ShapesTransactions on Pattern Analysis and Machine Intelligence
I Ryoo, M. S. and Aggarwal, J. KUT-Interaction Dataset, ICPR contest on Semantic Description of Human Activities (SDHA)ICPR
Ali Al-Raziqi, Joachim Denzler Interactions Modeling 23