62
Action Recognition Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem

Introduction to Computer Visionjbhuang/teaching/ece... · What is the purpose of action recognition? ... Moving corner

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

Action Recognition

Computer Vision

Jia-Bin Huang, Virginia Tech

Many slides from D. Hoiem

Page 2: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

This section: advanced topics

•Convolutional neural networks in vision

•Action recognition

•Vision and Language

•3D Scenes and Context

Page 3: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

What is an action?

Action: a transition from one state to another• Who is the actor?

• How is the state of the actor changing?

• What (if anything) is being acted on?

• How is that thing changing?

• What is the purpose of the action (if any)?

Page 4: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

How do we represent actions?

CategoriesWalking, hammering, dancing, skiing, sitting down, standing up, jumping

Poses

Nouns and Predicates<man, swings, hammer><man, hits, nail, w/ hammer>

Page 5: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

What is the purpose of action recognition?

To describe

Page 6: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

What is the purpose of action recognition?

•To predict

Page 7: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

What is the purpose of action recognition?

•To understand the intention and motivation

Predicting Motivations of Actions by Leveraging Text, CVPR 2016

Page 8: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

How can we identify actions?

Motion Pose

Held Objects

Nearby Objects

Page 9: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

Representing Motion

Bobick Davis 2001

Optical Flow with Motion History

Page 11: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

Representing Motion

Efros et al. 2003

Optical Flow with Split Channels

optical flow split into pos/neg channels blurred pos/neg flow

Page 12: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

Representing Motion

Tracked Points

Matikainen et al. 2009

Page 13: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

Representing MotionSpace-Time Interest Points

Corner detectors in space-time

Laptev 2005

Moving cornerBall hits wall

Balls collide Balls collide (different scale)

Page 14: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

Representing MotionSpace-Time Interest Points

Laptev 2005

Page 15: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

Examples of Action Recognition Systems

•Feature-based classification

•Recognition using pose and objects

Page 16: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

Action recognition as classification

Retrieving actions in movies, Laptev and Perez, 2007

Page 17: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

Remember image categorization…

Training Labels

Training

Images

Classifier Training

Training

Image Features

Image Features

Testing

Test Image

Trained Classifier

Trained Classifier Outdoor

Prediction

Page 18: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

Remember spatial pyramids….

Compute histogram in each spatial bin

Page 19: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

Features for Classifying Actions

1. Spatio-temporal pyramids • Image Gradients• Optical Flow

Page 20: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

Features for Classifying Actions

2. Spatio-temporal interest points

Corner detectors in space-time

Descriptors based on Gaussian derivative filters over x, y, time

Page 21: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

Classification

•Boosted stubs for pyramids of optical flow, gradient

•Nearest neighbor for STIP

Page 22: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

Searching the video for an action

1. Detect keyframes using a trained HOG detector in each frame

2. Classify detected keyframes as positive (e.g., “drinking”) or negative (“other”)

Page 23: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

Accuracy in searching video

Without keyframedetection

With keyframedetection

Page 24: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

Learning realistic human actions from movies, Laptev et al. 2008

“Talk on phone”

“Get out of car”

Page 25: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

Approach

•Space-time interest point detectors

•Descriptors• HOG, HOF

•Pyramid histograms (3x3x2)

•SVMs with Chi-Squared Kernel

Interest Points

Spatio-Temporal Binning

Page 26: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

Results

Page 27: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

Action Recognition using Pose and Objects

Modeling Mutual Context of Object and Human Pose in Human-Object Interaction Activities, B. Yao and Li Fei-Fei, 2010

Slide Credit: Yao/Fei-Fei

Page 28: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

Human-Object Interaction

Torso

Head

• Human pose estimation

Holistic image based classification

Integrated reasoning

Slide Credit: Yao/Fei-Fei

Page 29: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

Human-Object Interaction

Tennis

racket

• Human pose estimation

Holistic image based classification

Integrated reasoning

• Object detection

Slide Credit: Yao/Fei-Fei

Page 30: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

Human-Object Interaction

• Human pose estimation

Holistic image based classification

Integrated reasoning

• Object detection

Torso

Head

Tennis

racket

Activity: Tennis Forehand

Slide Credit: Yao/Fei-Fei

• Action categorization

Page 31: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

• Felzenszwalb & Huttenlocher, 2005

• Ren et al, 2005

• Ramanan, 2006

• Ferrari et al, 2008

• Yang & Mori, 2008

• Andriluka et al, 2009

• Eichner & Ferrari, 2009

Difficult part

appearance

Self-occlusion

Image region looks

like a body part

Human pose estimation & Object detection

Human pose

estimation is

challenging.

Slide Credit: Yao/Fei-Fei

Page 32: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

Human pose estimation & Object detection

Human pose

estimation is

challenging.

• Felzenszwalb & Huttenlocher, 2005

• Ren et al, 2005

• Ramanan, 2006

• Ferrari et al, 2008

• Yang & Mori, 2008

• Andriluka et al, 2009

• Eichner & Ferrari, 2009 Slide Credit: Yao/Fei-Fei

Page 33: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

Human pose estimation & Object detection

Facilitate

Given the

object is

detected.

Slide Credit: Yao/Fei-Fei

Page 34: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

• Viola & Jones, 2001

• Lampert et al, 2008

• Divvala et al, 2009

• Vedaldi et al, 2009

Small, low-resolution,

partially occluded

Image region similar

to detection target

Human pose estimation & Object detection

Object

detection is

challenging

Slide Credit: Yao/Fei-Fei

Page 35: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

Human pose estimation & Object detection

Object

detection is

challenging

• Viola & Jones, 2001

• Lampert et al, 2008

• Divvala et al, 2009

• Vedaldi et al, 2009

Slide Credit: Yao/Fei-Fei

Page 36: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

Human pose estimation & Object detection

Facilitate

Given the

pose is

estimated.

Slide Credit: Yao/Fei-Fei

Page 37: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

Human pose estimation & Object detection

Mutual Context

Slide Credit: Yao/Fei-Fei

Page 38: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

H

A

Mutual Context Model Representation

• More than one H for each A;

• Unobserved during training.

A:

Croquet

shot

Volleyball

smash

Tennis

forehand

Intra-class variations

Activity

Object

Human pose

Body parts

lP: location; θP: orientation; sP: scale.

Croquet

malletVolleyballTennis

racket

O:

H:

P:

f: Shape context. [Belongie et al, 2002]

P1

Image evidence

fO

f1 f2 fN

O

P2 PN

Slide Credit: Yao/Fei-Fei

Page 39: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

Learning Results

Cricket

defensive

shot

Cricket

bowling

Croquet

shot

Slide Credit: Yao/Fei-Fei

Page 40: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

Learning Results

Tennis

serve

Volleyball

smash

Tennis

forehand

Slide Credit: Yao/Fei-Fei

Page 41: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

I

Model Inference

The learned models

Slide Credit: Yao/Fei-Fei

Page 42: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

I

Model Inference

The learned models

Head detection

Torso detection

Tennis racket detectionLayout of the object and body parts.

Compositional

Inference

[Chen et al, 2007]

* *

1 1 1 1,, , , nn

A H O P

Slide Credit: Yao/Fei-Fei

Page 43: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

I

Model Inference

The learned models

* *

1 1 1 1,, , , nn

A H O P * *

,, , ,K K K K nn

A H O P

Output

Slide Credit: Yao/Fei-Fei

Page 44: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

Dataset and Experiment Setup

• Object detection;

• Pose estimation;

• Activity classification.

Tasks:

[Gupta et al, 2009]

Cricket

defensive shot

Cricket

bowling

Croquet

shot

Tennis

forehand

Tennis

serve

Volleyball

smash

Sport data set: 6 classes

180 training (supervised with object and part locations) & 120 testing images

Slide Credit: Yao/Fei-Fei

Page 45: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

[Gupta et al, 2009]

Cricket

defensive shot

Cricket

bowling

Croquet

shot

Tennis

forehand

Tennis

serve

Volleyball

smash

Sport data set: 6 classes

Dataset and Experiment Setup

• Object detection;

• Pose estimation;

• Activity classification.

Tasks:

180 training (supervised with object and part locations) & 120 testing images

Slide Credit: Yao/Fei-Fei

Page 46: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

Recall

Pre

cis

ion

Object Detection Results

Cricket bat

Valid

region

Croquet mallet Tennis racket Volleyball

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

Recall

Pre

cis

ion

Cricket ball

Our

Method

Sliding

window

Pedestrian

context

[Andriluka

et al, 2009]

[Dalal &

Triggs, 2006]

Slide Credit: Yao/Fei-Fei

Page 47: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

Object Detection Results

590 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

Recall

Pre

cis

ion

Volleyball

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

Recall

Pre

cis

ion

Cricket ball

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

RecallP

recis

ion

Our Method

Pedestrian as context

Scanning window detector

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

Recall

Pre

cis

ion

Our Method

Pedestrian as context

Scanning window detector

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

Recall

Pre

cis

ion

Our Method

Pedestrian as context

Scanning window detectorSliding window Pedestrian context Our method

Sm

all

ob

jec

tB

ac

kg

rou

nd

clu

tte

r

Slide Credit: Yao/Fei-Fei

Page 48: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

Dataset and Experiment Setup

• Object detection;

• Pose estimation;

• Activity classification.

Tasks:

[Gupta et al, 2009]

Cricket

defensive shot

Cricket

bowling

Croquet

shot

Tennis

forehand

Tennis

serve

Volleyball

smash

Sport data set: 6 classes

180 training & 120 testing images

Slide Credit: Yao/Fei-Fei

Page 49: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

Human Pose Estimation Results

Method Torso Upper Leg Lower Leg Upper Arm Lower Arm Head

Ramanan,

2006.52 .22 .22 .21 .28 .24 .28 .17 .14 .42

Andriluka et

al, 2009.50 .31 .30 .31 .27 .18 .19 .11 .11 .45

Our full

model.66 .43 .39 .44 .34 .44 .40 .27 .29 .58

Slide Credit: Yao/Fei-Fei

Page 50: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

Human Pose Estimation Results

Method Torso Upper Leg Lower Leg Upper Arm Lower Arm Head

Ramanan,

2006.52 .22 .22 .21 .28 .24 .28 .17 .14 .42

Andriluka et

al, 2009.50 .31 .30 .31 .27 .18 .19 .11 .11 .45

Our full

model.66 .43 .39 .44 .34 .44 .40 .27 .29 .58

Andriluka

et al, 2009

Our estimation

result

Tennis serve

modelAndriluka

et al, 2009

Our estimation

result

Volleyball

smash model

Slide Credit: Yao/Fei-Fei

Page 51: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

Human Pose Estimation Results

Method Torso Upper Leg Lower Leg Upper Arm Lower Arm Head

Ramanan,

2006.52 .22 .22 .21 .28 .24 .28 .17 .14 .42

Andriluka et

al, 2009.50 .31 .30 .31 .27 .18 .19 .11 .11 .45

Our full

model.66 .43 .39 .44 .34 .44 .40 .27 .29 .58

One pose

per class.63 .40 .36 .41 .31 .38 .35 .21 .23 .52

Estimation

result

Estimation

result

Estimation

result

Estimation

result

Slide Credit: Yao/Fei-Fei

Page 52: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

Dataset and Experiment Setup

• Object detection;

• Pose estimation;

• Activity classification.

Tasks:

[Gupta et al, 2009]

Cricket

defensive shot

Cricket

bowling

Croquet

shot

Tennis

forehand

Tennis

serve

Volleyball

smash

Sport data set: 6 classes

180 training & 120 testing images

Slide Credit: Yao/Fei-Fei

Page 53: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

Activity Classification Results

Gupta et

al, 2009

Our

model

Bag-of-

Words

83.3%

Cla

ssific

atio

n a

ccu

racy

78.9%

52.5%

0.9

0.8

0.7

0.6

0.5

Cricket

shot

Tennis

forehand

Bag-of-words

SIFT+SVM

Gupta et

al, 2009

Our

model

Slide Credit: Yao/Fei-Fei

Page 54: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

Motion features – Dense Trajectory

Action Recognition by Dense Trajectories, CVPR 2011Action Recognition with Improved Trajectories, ICCV 2013

Page 55: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

Video classification with CNNs

Large-scale Video Classification with Convolutional Neural Networks, CVPR 2014

Page 56: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

Video classification with CNNs

Large-scale Video Classification with Convolutional Neural Networks, CVPR 2014

Page 57: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

Two-stream CNN

Two-Stream Convolutional Networks for Action Recognition in Videos, NIPS 2014

Page 58: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

3D Convolutional Networks

Learning Spatiotemporal Features with 3D Convolutional Networks, ICCV 2015

Page 59: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

Action recognition -> Semantic role Labeling

Page 60: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner
Page 61: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

Take-home messages

•Action recognition is an open problem. • How to define actions?• How to infer them?• What are good visual cues? • How do we incorporate higher level reasoning?

Page 62: Introduction to Computer Visionjbhuang/teaching/ece... ·   What is the purpose of action recognition? ... Moving corner

Take-home messages

•Some work done, but it is just the beginning of exploring the problem. So far…• Actions are mainly categorical

(could be framed in terms of effect or intent)• Most approaches are classification using simple features

(spatial-temporal histograms of gradients or flow, s-t interest points, SIFT in images)

• Just a couple works on how to incorporate pose and objects

• Not much idea of how to reason about long-term activities or to describe video sequences