72
Lecture 18: Human Motion Recognition Professor FeiFei Li Stanford Vision Lab Stanford Vision Lab Lecture 18 - Fei-Fei Li 7Mar11 1

Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Lecture 18: Human Motion Recognition

Professor Fei‐Fei Li

Stanford Vision LabStanford Vision Lab

Lecture 18 -Fei-Fei Li 7‐Mar‐111

Page 2: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

What we will learn today?

• IntroductionIntroduction

• Motion classification using template matching

i l ifi i i i l• Motion classification using spatio‐temporal features

• Motion classification using pose estimation

Lecture 18 -Fei-Fei Li 7‐Mar‐112

Page 3: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

search

sport analysis

Human

sport analysissurveillance

MotionAnalysis

h isynthesisbiomechanics

signlanguage

Lecture 18 -Fei-Fei Li 7‐Mar‐113

game interfacesmotion captureg g

Page 4: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

It’s challenging…

Variation in Appearance Variation in Pose

Occlusion & clutter

Lecture 18 -Fei-Fei Li 7‐Mar‐114

Adapted from http://luthuli.cs.uiuc.edu/~daf/tutorial.htmlVariation in view‐point

Page 5: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

It’s challenging…It s challenging…

• No clear action/movement classesNo clear action/movement classes– Unless in specific domain

• At different temporal spans motion may• At different temporal spans, motion may acquire different meanings

I i i fl h i• Interactions influence the meaning– With objects

– With people

• Goals & intentions

Lecture 18 -Fei-Fei Li 7‐Mar‐115

Page 6: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Current motion technologyCurrent motion technology

• Mostly action discrimination• Mostly action discrimination– Walking or jumping?

M tl i i l i• Mostly in simple scenarios

• We don’t have a winner feature/representation yet

Lecture 18 -Fei-Fei Li 7‐Mar‐116

Page 7: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

KTH Dataset

Lecture 18 -Fei-Fei Li 7‐Mar‐117

[Schuldt et al ICPR 04]

Page 8: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Weizmann Dataset

Bend JumpRunWave2P‐Jump

SideSkipWalk Wave1Jacks

Lecture 18 -Fei-Fei Li 7‐Mar‐118

[Blank et al ICCV 2005]

Page 9: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

UCF SportsLifting

13 action classes

Golf‐swing • Diving • Liftingd• Golf‐swing

• Kicking• Riding‐Horse• Run

Lecture 18 -Fei-Fei Li 7‐Mar‐119

• Walk • … [Rodriguez et al CVPR 2008]

Page 10: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Hollywood Human ActionsHollywood Human Actions•AnswerPhone•EatEat•FightPerson•GetOutCarH dSh k

12 Action •HandShake•Kiss•Run

Classes

•SitDown•SitUp•StandUp

[Laptev et al 

StandUp

Lecture 18 -Fei-Fei Li 7‐Mar‐1110

CVPR08/CVPR09]

Page 11: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Three representations

• Global templateGlobal template

• Local video patches

• Kinematic/body pose

Lecture 18 -Fei-Fei Li 7‐Mar‐1111

Page 12: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Fields of view

Lecture 18 -Fei-Fei Li 7‐Mar‐1112

Page 13: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

The Big PictureThe Big Picture

• Goal: Action classificationGoal: Action classification– E.g. walk left, walk right, run left, run right, etc

• Steps• Steps:1. Tracking person in video

2. Feature extraction

3. Classification• Nearest Neighbor

• AdaBoost

Lecture 18 -Fei-Fei Li 7‐Mar‐1113

Page 14: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

How To Gather Action Data

A person in a particular body 

T ki

configuration should always map to approximately the same stabilized image

> We lose all translational• Tracking – Simple correlation‐based tracker

– User‐initialized

‐> We lose all translational information but optical flow features allow us to distinguish between running and walking.

Lecture 18 -Fei-Fei Li 7‐Mar‐1114

• Forms a spatio‐temporal volume for each person

Page 15: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Low‐level Motion Features

1.  Compute Optical Flow (Lucas and Kanadealgorithm).

2.  Optical flow vector field F is split into horizontal and vertical components (Fx, Fy)., y

3 F and F are split into four non negative3.  Fx and Fy are split into four non‐negative channels Fx+, Fx‐ , Fy+, Fy‐.

4.  Each channel is blurred to reduce noise.

Lecture 18 -Fei-Fei Li

7‐Mar‐1115

Page 16: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Motion Context Descriptor: A Feature that Combines Shape and MotionA Feature that Combines Shape and Motion

2x2 grid,18‐bin radial histogram

histogramconcatenation

concatenate medium scale 

motion

normalized 120x120 box

3 information channels

Lecture 18 -Fei-Fei Li 7‐Mar‐1116

[Sorokin&Tran ECCCV 08]

histogram motion summaries

120x120 box channels

Page 17: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Classifying ActionsClassifying Actions

• Nearest Neighbors [Efros et al ICCV 2003]Nearest Neighbors [Efros et al ICCV 2003]

• Boosting [Fathi & Mori CVPR 2008]

i i [ & S ki CC 2008]• Metric Learning [Tran & Sorokin ECCV 2008]

Lecture 18 -Fei-Fei Li 7‐Mar‐1117

Page 18: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Nearest Neighbors[Efros et al, ICCV 2003]

• Assume we have template videos containing ssu e e a e te p ate deos co ta gexamples of actions we want to classify– We form motion descriptors (i.e. feature vectors) for both the train and test data

• Find the k‐nearest neighbors and use majority g j yvoting to identify the action class

• The motion descriptors are constructed to be• The motion descriptors are constructed to be discriminative so a simple classification algorithm is sufficient.

Lecture 18 -Fei-Fei Li 7‐Mar‐1118

Page 19: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Nearest Neighbor Results[Efros et al, ICCV 2003]

I t id N t i hb t hInput video: Nearest neighbor match:

Action Classification Result

Lecture 18 -Fei-Fei Li 7‐Mar‐1119

Page 20: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Boosting [Fathi and Mori CVPR 2008]

• Low‐level features alone are not capable of discriminating p gbetween positive and negative classes.

– They are weak‐classifiers

– Combine them via AdaBoost

• Mid‐level features are obtained via AdaBoost from low‐level featuresfeatures.

Fi l l ifi bt i d i Ad b t f id l l f t

(Examples of low‐level features selected for mid‐level features.)

Lecture 18 -Fei-Fei Li

• Final classifier obtained via Adaboost from mid‐level features.

7‐Mar‐1120

Page 21: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Boosting [Fathi and Mori CVPR 2008]

• Mid‐level motion featuresMid level motion features

• Final classifier

Lecture 18 -Fei-Fei Li 7‐Mar‐1121

Page 22: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Three representations

• Global templateGlobal template

• Local video patches

• Kinematic/body pose

Lecture 18 -Fei-Fei Li 7‐Mar‐1122

Page 23: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Big Picture: Local Video PatchesBig Picture: Local Video Patches

• Similar to SIFT, we would like to identify key points inSimilar to SIFT, we would like to identify key points in videos– Space‐time Interest Points

• As done in bag‐of‐words approach, we can combine h ‘ ’ f fthe ‘Space‐time Interest Points’ to form a feature vector

Bag of space time features– Bag‐of‐space‐time features

• Allows the use of a discriminative classifier (e.g. SVM)

Lecture 18 -Fei-Fei Li

Allows the use of a discriminative classifier (e.g. SVM)

7‐Mar‐1123

Page 24: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Motivation

Sparse and Localrepresentation

Spatio temporalSpatio-temporalinformation

Lecture 18 -Fei-Fei Li

Johansson, 1973

7‐Mar‐1124

Page 25: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Local Video PatchesNo global assumptions  

Consider local spatio‐temporal neighborhoods 

b ihand waving

boxing

Lecture 18 -Fei-Fei Li 7‐Mar‐1125

Page 26: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Local Video PatchesNo global assumptions  

Consider local spatio‐temporal neighborhoods 

b ihand waving

boxing

Lecture 18 -Fei-Fei Li 7‐Mar‐1126

Page 27: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Local Video Patches

Bag of space‐time features + Statistical classifier ‐> Action Classification[Schuldt’04, Niebles’06, Zhang’07, Laptev’08,’09]

Collection of space‐time patches

Space‐time interest point detection

Video Feature RepresentationVideo Feature Representation

ClassifierClassifier

Lecture 18 -Fei-Fei Li 7‐Mar‐1127

RepresentationRepresentation

Page 28: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Space‐Time Features: DetectorSpace Time Features: Detector

• Extend the idea of 2D interest points to 3DExtend the idea of 2D interest points to 3D interest points in space and time.

• Build on the idea of Harris and Förstnerinterest point operators

• We want to find regions with large variation in b th d ti d iboth space and time domains

Lecture 18 -Fei-Fei Li 7‐Mar‐1128

Page 29: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Space‐Time Features: Detector Space‐time corner detector

[Laptev, IJCV 2005]

Dense scale sampling (no explicit scale selection)

Lecture 18 -Fei-Fei Li 7‐Mar‐1129

Page 30: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Space‐Time Features: Detectorp

Feature extraction given interest point locations:

Histogram of oriented  Histogram of oriented 

Lecture 18 -Fei-Fei Li 7‐Mar‐1130

gradient (HOG) flow (HOF)

Page 31: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Codebook and

Training …

and Representation

Videos …

Feature Space

Lecture 18 -Fei-Fei Li 7‐Mar‐1131

Page 32: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Codebook and

Training …

and Representation

Videos … Bag of features!

Input Video

Feature Space

C codewords

Codewords

Lecture 18 -Fei-Fei Li 7‐Mar‐1132

Page 33: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Classifying ActionsClassifying Actions• SVM [Schuldt et al ICPR04, Laptev CVPR09]

• Topic Models (pLSA, LDA) [Niebles et al IJCV 08]

• ……

Lecture 18 -Fei-Fei Li 7‐Mar‐1133

Page 34: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Topic Models (pLSA, LDA) [Niebles et al IJCV 08]

Lecture 18 -Fei-Fei Li 7‐Mar‐1134

Page 35: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

ObjectObject Bag of ‘words’Bag of ‘words’

Lecture 18 -Fei-Fei Li

Visual dictionary

7‐Mar‐1135

Page 36: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

ActionAction Bag of motion ‘words’Bag of motion ‘words’

Lecture 18 -Fei-Fei Li 7‐Mar‐1136

Visual dictionary

Page 37: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Codebookwalking running joggingwalking running jogging

boxinghandclappinghandwaving

Lecture 18 -Fei-Fei Li

boxinghandclappinghandwaving

7‐Mar‐1137

Page 38: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Unsupervised learning using pLSA

NWd

z wd

spatio temporal word

input

spatio‐temporal word

action category“camel spin”

Lecture 18 -Fei-Fei Li

pJ.C. Niebles, H. Wang, L. Fei‐Fei,  BMVC, 2006

7‐Mar‐1138

Page 39: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

pLSA Model

NWd

z wd

K

K

kjkkiji dzpzwpdwp

1)|()|()|(

action category vectors

Word distribution per action category

action category weights

Action category distribution per video

Lecture 18 -Fei-Fei Li

g y p

7‐Mar‐1139

J.C. Niebles, H. Wang, L. Fei‐Fei,  BMVC, 2006

Page 40: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

pLSA Model

NWd

z wd

K

k

jkkiji dzpzwpdwp1

)|()|()|(

Unsupervised LearningUnsupervised Learning

M Ndwn

jijidwpL ,)|(

Lecture 18 -Fei-Fei Li

i j1 1

7‐Mar‐1140

Page 41: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Recognition

New video

Codewords

Word distribution given action category

Action category distribution given test video given action categorygiven test video

zw PdzP dwP K

k ktestktest

1|||

testk dzP |maxargcategoryaction

Lecture 18 -Fei-Fei Li

k

7‐Mar‐1141

Page 42: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

KTH d t t

Experiment I:

KTH dataset[Schuldt et al., 2004]:

25 persons, indoors and outdoors, 4 long sequences per person

Lecture 18 -Fei-Fei Li

p , , g q p pJ.C. Niebles, H. Wang, L. Fei‐Fei,  BMVC, 2006

7‐Mar‐1142

Page 43: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Experiment I: Performance

• Leave-one person out cross validation

• Average performance: 81 50%

• Unsupervised training• Handle multiple motions

• Average performance: 81.50%

81.50% 81.17%85%

71 72%75%

80%

71.72%

70%

62.96%

60%

65%

Our Dollar et Schuldt Ke et al

Lecture 18 -

OurMethod

Dollar etal.

Schuldtet al.

Ke et al.

7‐Mar‐11

Page 44: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Experiment I: Caltech dataset

walkingg

running

Trained with the KTH data

Tested with the Caltech dataset

Onl o ds f om the co esponding action a e sho n

Lecture 18 -

Only words from the corresponding action are shownJ.C. Niebles, H. Wang, L. Fei‐Fei,  BMVC, 2006

7‐Mar‐11

Page 45: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Experiment I: A longer sequence

walkingwalking

running

Trained with the KTH data

Tested with our own data

Lecture 18 -

J.C. Niebles, H. Wang, L. Fei‐Fei,  BMVC, 2006

7‐Mar‐11

Page 46: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Experiment I: Multiple motions

handclapping

handwaving

Trained with the KTH data

Tested with our own dataown data

Lecture 18 -

J.C. Niebles, H. Wang, L. Fei‐Fei,  BMVC, 2006

7‐Mar‐11

Page 47: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Figure Skating data set:

Experiment II:Figure Skating data set:[Y.Wang, G.Mori et al, CVPR 2006]

7 pe sons 3 action classes camel spin stand spin sit spin

Lecture 18 -Fei-Fei Li

7 persons, 3 action classes: camel spin, stand spin, sit spinJ.C. Niebles, H. Wang, L. Fei‐Fei,  BMVC, 2006

7‐Mar‐1147

Page 48: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Experiment II: Examples

Figure skating actions

Camel spin Sit spin Stand spinCa e sp S t sp p

Lecture 18 -Fei-Fei Li

J.C. Niebles, H. Wang, L. Fei‐Fei,  BMVC, 2006

7‐Mar‐1148

Page 49: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Experiment II: Long Sequences

Lecture 18 -Fei-Fei Li

J.C. Niebles, H. Wang, L. Fei‐Fei,  BMVC, 2006

7‐Mar‐1149

Page 50: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

But…

W d d l th t l it t i l t f f t

Lecture 18 -Fei-Fei Li

• We need models that exploit geometrical arrangements of features

7‐Mar‐1150

Page 51: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

In the object recognition world

• all have same P under bag‐of‐words model.

• part‐based models that capture geometrical informationC t ll ti d l– Constellation model

– Pictorial structures

– etc

Lecture 18 -Fei-Fei Li 7‐Mar‐1151

Page 52: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

bags of featuresConstellation Model

P1 P2

P3 P4

w w

Image Image

Bg

Image g

Strong shape representation Large number of features

Lecture 18 -Fei-Fei Li

Small number of features No shape information

7‐Mar‐1152

Page 53: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

bags of featuresConstellation ModelConstellation of bags of features

P1 P2 P1 P2

P3 P4 P3 P4

w ww

Image ImageImage

Bg

Image g

Bg

Image

Strong shape representation Large number of features

Lecture 18 -Fei-Fei Li

J.C. Niebles, & L. Fei‐Fei,  CVPR, 2007

7‐Mar‐1153

Page 54: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Constellation ModelConstellation of bags of 

features

P1 P2 P1 P2

P3 P4 P3 P4P3 P4 P3 P4

w w

Bg

Image

Bg

Image

Lecture 18 -Fei-Fei Li 7‐Mar‐1154

J.C. Niebles, & L. Fei‐Fei,  CVPR, 2007

Page 55: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Constellation of bags of features

P1 P2

P3 P4

Partlayer

P3 P4

wFeaturelayer

Bg

Image

Lecture 18 -Fei-Fei Li 7‐Mar‐1155

J.C. Niebles, & L. Fei‐Fei,  CVPR, 2007

Page 56: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Constellation of bags of featuresθω

bendP1 P2

θω

P3 P4P3 P4

w

• Use a mixture to account for data multimodality. Bg

Image

Lecture 18 -Fei-Fei Li 7‐Mar‐1156

J.C. Niebles, & L. Fei‐Fei,  CVPR, 2007

Page 57: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Constellation of bags of featuresθω

P1 P2

θω

Partlayer

P3 P4P3 P4

Featurelayer

w

Bg

Image

Lecture 18 -Fei-Fei Li 7‐Mar‐1157

J.C. Niebles, & L. Fei‐Fei,  CVPR, 2007

Page 58: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

θω

Mixture model

Single component

bag of words

P

P1 P2

PP3

P1 P2

P4P3 P4

w

3 4

w w

BgImage

BgImage Image

Classification performance

Lecture 18 -Fei-Fei Li 7‐Mar‐1158

Page 59: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Constellation of bags of featuresθω

P1 P2

θω

P3 P4

Partlayer

P3 P4

wFeaturelayer

Bg

Image

Lecture 18 -Fei-Fei Li 7‐Mar‐1159

J.C. Niebles, & L. Fei‐Fei,  CVPR, 2007

Page 60: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Constellation of bags of features

S i fθω• Static features

Edge map + Shape Context[Belongie ’03]

P1 P2

θω

• Motion featuresInterest Points + ST‐gradients[D ll ’05] P3 P4[Dollar ’05] P3 P4

Featurelayer

w

Bg

Image

Lecture 18 -Fei-Fei Li 7‐Mar‐1160

J.C. Niebles, & L. Fei‐Fei,  CVPR, 2007

Page 61: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Constellation of bags of featuresθωS i f

P1 P2

θω• Static featuresEdge map + Shape Context

[Belongie ’03]

P3 P4

• Motion featuresInterest Points + ST‐gradients[D ll ’05] P3 P4[Dollar ’05]

w

Bg

Image

Lecture 18 -Fei-Fei Li 7‐Mar‐1161

J.C. Niebles, & L. Fei‐Fei,  CVPR, 2007

Page 62: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Dataset

Bend JumpRunWave2P-Jump

SideSkipWalk Wave1Jacks

“Actions as Space‐Time Shapes”M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri.IEEE International Conference on Computer Vision (ICCV), Beijing, October 2005.

SideSkipWalk Wave1Jacks

Lecture 18 -Fei-Fei Li

f p ( ) j g

7‐Mar‐1162

J.C. Niebles, & L. Fei‐Fei,  CVPR, 2007

Page 63: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Experiment

• 9 action classes,performed by 9subjects [Blank et al 2005]

• Leave one outcross‐validation

• Video Classificationfperformance: 72.8% 

Lecture 18 -Fei-Fei Li 7‐Mar‐1163

J.C. Niebles, & L. Fei‐Fei,  CVPR, 2007

Page 64: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Recognition with constellation of bags of features 

Predicted frame label

Ground truth frame label frame label

PredictedAccumulated votes

Predicted sequence label

Correct Incorrect

Lecture 18 -Fei-Fei Li 7‐Mar‐1164

J.C. Niebles, & L. Fei‐Fei,  CVPR, 2007

Page 65: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Recognition with constellation of bags of features 

Predicted frame label

Ground truth frame label frame label

PredictedAccumulated votes

Predicted sequence label

Correct Incorrect

Lecture 18 -Fei-Fei Li 7‐Mar‐1165

J.C. Niebles, & L. Fei‐Fei,  CVPR, 2007

Page 66: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Three representations

• Global templateGlobal template

• Local video patches

• Kinematic/body pose

Lecture 18 -Fei-Fei Li 7‐Mar‐1166

Page 67: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Lecture 18 -Fei-Fei Li 7‐Mar‐1167

[Ramanan& Forsyth, NIPS 2003]

Page 68: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Lecture 18 -Fei-Fei Li 7‐Mar‐1168

Page 69: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Lecture 18 -Fei-Fei Li 7‐Mar‐1169

Page 70: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

search

sport analysis

Human

sport analysissurveillance

MotionAnalysis

th isynthesisbiomechanics

signlanguage

Lecture 18 -Fei-Fei Li 7‐Mar‐1170

game interfacesmotion capture

Page 71: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

Useful SurveysUseful Surveys

• Turaga, P.K., Chellappa, R., Subrahmanian, V.S., Udrea, g , , pp , , , , ,O.Machine Recognition of Human Activities: A SurveyCirSysVideo(18), No. 11, November 2008, pp. 1473‐14881488.

• David A. Forsyth, Okan Arikan, Leslie Ikemoto.y , ,Computational Studies of Human Motion: Tracking and Motion Synthesis, Part 1

Lecture 18 -Fei-Fei Li 7‐Mar‐1171

Page 72: Lecture 18: Human Motion Recognitionvision.stanford.edu/teaching/cs231a_autumn1213_internal/... · 2012. 9. 22. · –Simple correlation‐based tracker –User‐initialized ‐

What we have learned today?

• IntroductionIntroduction

• Motion classification using template matching

i l ifi i i i l• Motion classification using spatio‐temporal features

• Motion classification using pose estimation

Lecture 18 -Fei-Fei Li 7‐Mar‐1172