60
CS 1699: Intro to Computer Vision Human Pose and Actions Prof. Adriana Kovashka University of Pittsburgh December 8, 2015

CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

Embed Size (px)

Citation preview

Page 1: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

CS 1699: Intro to Computer Vision

Human Pose and Actions

Prof. Adriana KovashkaUniversity of Pittsburgh

December 8, 2015

Page 2: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

Today

• Human pose and actions: Introduction

• Estimating human pose

• Recognizing human actions

– Using specialized features

– Using pose

– Using objects

– From ego-centric video

Page 3: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

Next time (Last class)

• Review for the final exam + OMETs

• By Wednesday night, post on Piazza questions or anything you want me to review, for participation credit

• Extra office hours on Friday, 2-3pm

Page 4: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

Final Exam

• Monday, Dec. 14, 12pm

• Same room (5502 Sennott Square)

• Similar to midterm exam (mostly short questions and a few problems), but longer (100 points)

• Will only cover topics discussed after midterm (but some of these use topics from first half)

Page 5: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

Homework 4

Mean = 77.16, median = 99, max = 123

Page 6: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

Homework 5

• Due Thursday

• See Piazza for correction about how to get probabilities for Part III

Page 7: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

Participation

• Tentative grades entered on CourseWeb

• Median is 80%

Page 8: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

What is an action/activity?

Action: a transition from one state to another• Who is the actor?• How is the state of the actor changing?• What (if anything) is being acted on?• How is that thing changing?• What is the purpose of the action (if any)?• Could be more or less complex

Adapted from Derek Hoiem

Page 9: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

Terminology: Human activity in video

No universal terminology, but approximately:

• “Actions”: atomic motion patterns – often gesture-like, single clear-cut trajectory, single nameable behavior (e.g., sit, wave arms)

• “Activity”: series or composition of actions (e.g., interactions between people)

• “Event”: combination of activities or actions (e.g., a football game, a traffic accident)

Adapted from Venu Govindaraju

Page 10: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

How do we represent actions?

CategoriesWalking, hammering, dancing, skiing, sitting down, standing up, jumping

Poses

Nouns and Predicates<man, swings, hammer><man, hits, nail, w/ hammer>

Derek Hoiem

Page 11: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

How can we identify actions?

Motion Pose

Held Objects

Nearby Objects

Derek Hoiem

Page 12: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

Today

• Human pose and actions: Introduction

• Estimating human pose

• Recognizing human actions

– Using specialized features

– Using pose

– Using objects

– From ego-centric video

Page 13: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

Jamie Shotton, Andrew Fitzgibbon, Mat Cook,Toby Sharp, Mark Finocchio, Richard Moore,

Alex Kipman, Andrew Blake

Best paper award at CVPR 2011

Adapted from Jamie Shotton

Page 14: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

Recognize large variety of human poses, all shapes & sizes

Limited compute budget

super-real time on Xbox 360 to allow games to run concurrently

Adapted from Jamie Shotton

Page 15: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

rightelbow

right hand leftshoulderneck

Jamie Shotton

Page 16: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

No temporal information

frame-by-frame

Local pose estimate of parts

each pixel & each body joint treated independently

reduced training data and computation time

Very fast

simple depth image features

parallel decision forest classifier

Jamie Shotton

Page 17: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

inferbody parts

per pixelcluster pixels to

hypothesizebody jointpositions

capturedepth image &

remove bg

fit model &track skeleton

Jamie Shotton

Page 18: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

Compute P(ci|wi)

pixels i = (x, y)

body part ci

image window wi

Discriminative approach

learn classifier P(ci|wi) from training data

Jamie Shotton

Page 19: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

Train invariance to:

Record mocap500k frames

distilled to 100k poses

Retarget to several models

Render (depth, body parts) pairs

Jamie Shotton

Page 20: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

Depth comparisons

Very fast to compute

inputdepthimage

xΔx

Δ

x

Δ

x

Δ

𝑓 𝐼, x = 𝑑𝐼 x − 𝑑𝐼(x + Δ)

image depth

image coordinate

offset depth

featureresponse

Adapted from Jamie Shotton

Θ

fΘ (I, x) =

Page 21: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

To classify pixel x, start here

no

Toy example:distinguishleft (L) and right (R)sides of the body

no yes

yes

L R

P(c)

L R

P(c)

L R

P(c)

fΘ(I, x; Δ1) > t1

fΘ(I, x; Δ2) > t2

Adapted from Jamie Shotton

Page 22: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

depth 1depth 2depth 3depth 4depth 5depth 6depth 7depth 8depth 9depth 10depth 11depth 12depth 13depth 14depth 15depth 16depth 17depth 18

input depth ground truth parts inferred parts (soft)

Jamie Shotton

Page 23: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

30%

35%

40%

45%

50%

55%

60%

65%

8 12 16 20

Ave

rag

e p

er-

cla

ss a

ccu

racy

Depth of trees

30%

35%

40%

45%

50%

55%

60%

65%

5 10 15 20Depth of trees

synthetic test data real test data

Jamie Shotton

Page 24: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

Trained on different random subset of images

“bagging” helps avoid over-fitting

Average tree posteriors

[Amit & Geman 97][Breiman 01]

[Geurts et al. 06]

………tree 1 tree T

c

P1(c)c

PT(c)

(𝐼, x) (𝐼, x)

𝑃 𝑐 𝐼, x =1

𝑇

𝑡=1

𝑇

𝑃𝑡(𝑐|𝐼, x)

Jamie Shotton

Page 25: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

ground truth

1 tree 3 trees 6 trees

inferred body parts (most likely)

40%

45%

50%

55%

1 2 3 4 5 6

Ave

rag

e p

er-

cla

ss a

ccu

racy

Number of trees

Jamie Shotton

Page 26: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

front view top viewside view

input depth inferred body parts

inferred joint positions (modes found using mean shift)

no tracking or smoothingJamie Shotton

Page 27: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

front view top viewside view

input depth inferred body parts

no tracking or smoothing

inferred joint positions (modes found using mean shift)

Jamie Shotton

Page 28: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

Today

• Human pose and actions: Introduction

• Estimating human pose

• Recognizing human actions

– Using specialized features

– Using pose

– Using objects

– From ego-centric video

Page 29: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

Representing Actions

Tracked Points

Matikainen et al. 2009Adapted from Derek Hoiem

Page 30: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

Representing Actions

Space-Time Interest Points

Laptev 2005

• Corner detectors in space+time

Adapted from Derek Hoiem

Page 31: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

Representing Actions

Laptev 2005

Page 32: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

“Talk on phone”

“Get out of car”

Derek Hoiem

Learning realistic human actions from movies, Laptev et al. 2008

Page 33: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

Approach

• Space-time interest point detectors

• Descriptors

– HOG, HOF

• Pyramid histograms (3x3x2)

• SVMs with Chi-Squared Kernel

Interest Points

Spatio-Temporal Binning

Derek Hoiem, figures from Ivan Laptev

Page 34: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

Results

Derek Hoiem, figures from Ivan Laptev

Page 35: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

Today

• Human pose and actions: Introduction

• Estimating human pose

• Recognizing human actions

– Using specialized features

– Using pose

– Using objects

– From ego-centric video

Page 36: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

Human-Object Interaction

Torso

Head

• Human pose estimation

Holistic image based classification

Integrated reasoning

Yao/Fei-Fei

Page 37: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

Human-Object Interaction

Tennis

racket

• Human pose estimation

Holistic image based classification

Integrated reasoning

• Object detection

Yao/Fei-Fei

Page 38: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

Human-Object Interaction

• Human pose estimation

Holistic image based classification

Integrated reasoning

• Object detection

Torso

Head

Tennis

racket

Activity: Tennis Forehand

• Action categorization

Yao/Fei-Fei

Page 39: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

• Felzenszwalb & Huttenlocher, 2005

• Ren et al, 2005

• Ramanan, 2006

• Ferrari et al, 2008

• Yang & Mori, 2008

• Andriluka et al, 2009

• Eichner & Ferrari, 2009

Difficult part

appearance

Self-occlusion

Image region looks

like a body part

Human pose estimation & Object detection

Human pose

estimation is

challenging.

Yao/Fei-Fei

Page 40: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

Human pose estimation & Object detection

Human pose

estimation is

challenging.

• Felzenszwalb & Huttenlocher, 2005

• Ren et al, 2005

• Ramanan, 2006

• Ferrari et al, 2008

• Yang & Mori, 2008

• Andriluka et al, 2009

• Eichner & Ferrari, 2009Yao/Fei-Fei

Page 41: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

Human pose estimation & Object detection

Facilitate

Given the

object is

detected.

Yao/Fei-Fei

Page 42: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

• Viola & Jones, 2001

• Lampert et al, 2008

• Divvala et al, 2009

• Vedaldi et al, 2009

Small, low-resolution,

partially occluded

Image region similar

to detection target

Human pose estimation & Object detection

Object

detection is

challenging

Yao/Fei-Fei

Page 43: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

Human pose estimation & Object detection

Object

detection is

challenging

• Viola & Jones, 2001

• Lampert et al, 2008

• Divvala et al, 2009

• Vedaldi et al, 2009

Yao/Fei-Fei

Page 44: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

Human pose estimation & Object detection

Facilitate

Given the

pose is

estimated.

Yao/Fei-Fei

Page 45: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

Human pose estimation & Object detection

Mutual Context

Yao/Fei-Fei

Page 46: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

Learning Results

Tennis

serve

Volleyball

smash

Tennis

forehand

Yao/Fei-Fei

Page 47: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

Activity Classification Results

Gupta et

al, 2009

Our

model

Bag-of-

Words

83.3%

Cla

ssific

atio

n a

ccu

racy

78.9%

52.5%

0.9

0.8

0.7

0.6

0.5

Cricket

shot

Tennis

forehand

Bag-of-words

SIFT+SVM

Gupta et

al, 2009

Our

model

Yao/Fei-Fei

Page 48: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

Today

• Human pose and actions: Introduction

• Estimating human pose

• Recognizing human actions

– Using specialized features

– Using pose

– Using objects

– From ego-centric video

Page 49: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

Detecting Activities of Daily Living

in First-person Camera Views

Hamed Pirsiavash, Deva Ramanan

CVPR 2012

Hamed Pirsiavash

Page 50: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

MotivationA sample video of Activities of Daily Living

Page 51: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

ApplicationsTele-rehabilitation

• Kopp et al,, Arch. of Physical Medicine and Rehabilitation. 1997.

• Catz et al, Spinal Cord 1997.

Long-term at-home monitoring

Hamed Pirsiavash

Page 52: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

ApplicationsLife-logging

• Gemmell et al, “MyLifeBits: a personal database for everything.” Communications of the ACM 2006.

• Hodges et al, “SenseCam: A retrospective memory aid”, UbiComp, 2006.

So far, mostly “write-only” memory!

This is the right time for computer vision community to get involved.

Hamed Pirsiavash

Page 53: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

53

Wearable ADL detection

ADL actions derived from medical

literature on patient rehabilitation

It is easy to collect

natural data

Hamed Pirsiavash

Page 54: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

ChallengesWhat features to use?

Low level features

(Weak semantics)

High level features

(Strong semantics)

Human pose

Difficulties of pose:

• Detectors are not accurate enough

• Not useful in first person camera views

Space-time interest points

Laptev, IJCV’05

Hamed Pirsiavash

Page 55: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

ChallengesWhat features to use?

Low level features

(Weak semantics)

High level features

(Strong semantics)

Human pose Object-centric featuresSpace-time interest points

Laptev, IJCV’05Difficulties of pose:

• Detectors are not accurate enough

• Not useful in first person camera views

Hamed Pirsiavash

Page 56: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

Challenges Long-scale temporal structure

time

Start boiling

water

Do other things

(while waiting)Pour in cup Drink tea

Wearable data: making tea

“Classic” data: boxing

Adapted from Hamed Pirsiavash

Page 57: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

Appearance feature: bag of objects

Bag of detected objects

fridge TVstove

fridge TVstove

SVM

classifier

Video clip

Hamed Pirsiavash

Page 58: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

Inspired by “Spatial Pyramid” CVPR’06 and “Pyramid Match Kernels” ICCV’05

Temporal pyramidCoarse to fine correspondence matching with a multi-layer pyramid

Temporal pyramid

descriptor

Video clip

SVM

classifier

timeHamed Pirsiavash

Page 59: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

Accuracy on 18 action categories• Our model: 40.6%

• STIP baseline: 22.8%

Hamed Pirsiavash

Page 60: CS 1699: Intro to Computer Vision Introduction · probabilities for Part III. ... Derek Hoiem, figures from Ivan Laptev. Results Derek Hoiem, ... • Action recognition still an open

Summary: Human actions

• Action recognition still an open problem

– How to represent actions?

• Types of data: atomic and more complex actions, ego-

centric video

• Common representations

– Space-time interest points

– Pose

– Objects (and temporal pyramids of objects)

• Pose

– Can be approached as a classification problem using depth data