110
Spatiotemporal Graphs for Object Segmentation, Human Pose Estimation and Action Detection in Videos Mubarak Shah Center for Research in Computer Vision University of Central Florida

Spatiotemporal Graphs for Object Segmentation, Human Pose

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Spatiotemporal Graphs for Object Segmentation, Human Pose

Spatiotemporal Graphs for Object Segmentation, Human Pose Estimation and

Action Detection in Videos

Mubarak Shah

Center for Research in Computer Vision

University of Central Florida

Page 2: Spatiotemporal Graphs for Object Segmentation, Human Pose

Spatiotemporal Graphs (STG)

• Video-based problems

• Nodes and edges

• Spatiotemporal

• Type I

• Type II

Page 3: Spatiotemporal Graphs for Object Segmentation, Human Pose

Frame 3 Frame 2

Type I Spatiotemporal Graph (STG)

• Nodes represent entities in single frames

Frame 1

……

……

……

Frame ...

Nodes can be: Object proposals Pixels Super-pixels Object locations …

Edges can be: Color similarities Distances Shape similarities …

Page 4: Spatiotemporal Graphs for Object Segmentation, Human Pose

Type II Spatiotemporal Graph (STG)

• Nodes represent entities in multiple frames

Nodes can be: Object tracklets Super-voxels …

Edges can be: Appearance similarities Motion models Overlaps …

Page 5: Spatiotemporal Graphs for Object Segmentation, Human Pose

Examples of Spatiotemporal Graphs

Page 6: Spatiotemporal Graphs for Object Segmentation, Human Pose

Original Video Object Segmentation

Video Object Segmentation (VOS)

Page 7: Spatiotemporal Graphs for Object Segmentation, Human Pose

Spatiotemporal Graph (STG): Video Object Segmentation

Frame i-1 Frame i Frame i+1

……

……

……

……

……

……

t s

Page 8: Spatiotemporal Graphs for Object Segmentation, Human Pose

Video Object Co-Segmentation (VOCS)

Page 9: Spatiotemporal Graphs for Object Segmentation, Human Pose

… …

Video 1 Video 2

Tracklets

Tracklets

STG – Video Object Co-Segmentation

Page 10: Spatiotemporal Graphs for Object Segmentation, Human Pose

Human Pose Estimation in Videos (HPEV)

Page 11: Spatiotemporal Graphs for Object Segmentation, Human Pose

STG – Human Pose Estimation in Videos

Head Top …

Head Bottom …

Hip

Shoulder

… …

Knee

Elbow

… …

Ankle

Hand … …

Page 12: Spatiotemporal Graphs for Object Segmentation, Human Pose

Action Detection (HAD)

Diving

Page 13: Spatiotemporal Graphs for Object Segmentation, Human Pose

Video 1

Spatiotemporal Context Graphs for Training Videos

Co

mp

osi

te G

rap

h (

)

Training Videos for Action c

Video n

Context Graphs

G1 ( V1, E1 )

Gn ( Vn , En ) …

Page 14: Spatiotemporal Graphs for Object Segmentation, Human Pose

Outline

• Video Object Segmentation (VOS)

• Video Object Co-Segmentation (VOCS)

• Human Pose Estimation in Videos (HPEV)

• Human Action Detection (HAD)

Page 15: Spatiotemporal Graphs for Object Segmentation, Human Pose

Video Object Segmentation (VOS)

Dong Zhang, Omar Javed, and Mubarak Shah, “Video object segmentation through spatially accurate and temporally dense extraction of primary object regions”, CVPR, 2013

Page 16: Spatiotemporal Graphs for Object Segmentation, Human Pose

Video Object Segmentation (VOS)

• Applications • Object Recognition

• Activity Recognition

• Surveillance

Page 17: Spatiotemporal Graphs for Object Segmentation, Human Pose

Video Object Segmentation (VOS)

• Challenges • Camera movements

• Varieties of objects

• Deformable objects

Page 18: Spatiotemporal Graphs for Object Segmentation, Human Pose

Spatiotemporal Graph for Object Selection

GMMs and MRF based Optimization

Input Video

Object Segmentation

Object Proposal Generation

Framework

Page 19: Spatiotemporal Graphs for Object Segmentation, Human Pose

Object Proposal Generation

• Object proposal methods [1,2]

[1] Ian Endres and Derek Hoiem, “Category Independent Object Proposals”, ECCV, 2010

[2] Alexe, B., Deselares, T. and Ferrari, V., “What is an object?”, CVPR, 2010

… …

Page 20: Spatiotemporal Graphs for Object Segmentation, Human Pose

Frame index

Segtrack (monkeydog)

… … 100 1 2 3 4

30

40

17

21 … …

… …

… … 100 1 2 3 4

1 2 3 4

1 2 3 4

51

60

… … 100 1 2 3 4

18

Ranked object proposals

Sample a lot of proposals! Select the right ones!

Page 21: Spatiotemporal Graphs for Object Segmentation, Human Pose

100

100

… …

… …

… …

… …

… …

Frame index

96

98

100

1 2 3 4

1 2 3 4

1 2 3 4

1 2 3 4

1 2 3 4

Segtrack (parachute)

33

38

40

43

49

Ranked object proposals expansion

Multiple proposals

Page 22: Spatiotemporal Graphs for Object Segmentation, Human Pose

Spatiotemporal Graph for Object Selection

Page 23: Spatiotemporal Graphs for Object Segmentation, Human Pose

Beginning node Ending node

Unary edge Represents object-ness

An object proposal

Unary Edge

Page 24: Spatiotemporal Graphs for Object Segmentation, Human Pose

𝑺𝒖𝒏𝒂𝒓𝒚 = 𝑴 𝒓 + 𝑨(𝒓)

𝑨 𝒓 : appearance score Objectness

𝑴(𝒓) : average Frobenius norm for optical flow gradient

𝑼𝒙 =𝒖𝒙 𝒖𝒚𝒗𝒙 𝒗𝒚 𝑭

= 𝒖𝒙𝟐 + 𝒖𝒚

𝟐 + 𝒗𝒙𝟐 + 𝒗𝒚

𝟐

Unary Edge: Score

Page 25: Spatiotemporal Graphs for Object Segmentation, Human Pose

Original video frame

Optical flow

Object region (proposal

Optical flow gradient

Boundary region

OF gradient around boundary

Unary Edge: Motion Score

Page 26: Spatiotemporal Graphs for Object Segmentation, Human Pose

Binary edge

Frame i Frame i+1

… …

… …

Frame i+2

… …

… …

… …

… …

… …

… …

… …

… …

… …

… …

… …

Binary Edges

Page 27: Spatiotemporal Graphs for Object Segmentation, Human Pose

𝑺𝒃𝒊𝒏𝒂𝒓𝒚 = 𝝀 ∙ 𝑺𝒐𝒗𝒆𝒓𝒍𝒂𝒑 𝒓𝒎, 𝒓𝒏 ∙ 𝑺𝒄𝒐𝒍𝒐𝒓 (𝒓𝒎, 𝒓𝒏)

𝑺𝒄𝒐𝒍𝒐𝒓(𝒓𝒎, 𝒓𝒏) = 𝒉𝒊𝒔𝒕(𝒓𝒎) ∙ 𝒉𝒊𝒔𝒕(𝒓𝒏) 𝑻

𝑺𝒐𝒗𝒆𝒓𝒍𝒂𝒑(𝒓𝒎, 𝒓𝒏) =𝒓𝒎 ∩ 𝒘𝒂𝒓𝒑𝒎𝒏(𝒓𝒏)

𝒓𝒎 ∪ 𝒘𝒂𝒓𝒑𝒎𝒏(𝒓𝒏)

Binary Edge Score

Page 28: Spatiotemporal Graphs for Object Segmentation, Human Pose

…… ……

…… ……

Frame i-1 Frame i Frame i+1

…… ……

t s

Goal: Find only one object proposal from each frame, such that all of them have high object-ness and high similarity across frames.

Find the highest weighted path in the DAG.

Longest Path Problem of DAG Dynamic Programming Solution.

Final Spatiotemporal Graph

Page 29: Spatiotemporal Graphs for Object Segmentation, Human Pose

Results

Page 30: Spatiotemporal Graphs for Object Segmentation, Human Pose

Qualitative Results – “Girl”

Original video Ground truth

Selected object proposals Segmentation results

Region within the red boundary is the object region

Page 31: Spatiotemporal Graphs for Object Segmentation, Human Pose

Qualitative Results – “Parachute”

Original video Ground truth

Selected object proposals Segmentation results

Region within the red boundary is the object region

Page 32: Spatiotemporal Graphs for Object Segmentation, Human Pose

Qualitative Results – “Birdfall”

Original video Ground truth Segmentation results

Region within the red boundary is the object region

Page 33: Spatiotemporal Graphs for Object Segmentation, Human Pose

Original video Ground truth Segmentation results

Qualitative Results – “Cheetah”

Region within the red boundary is the object region

Page 34: Spatiotemporal Graphs for Object Segmentation, Human Pose

Original video Ground truth Segmentation results

Qualitative Results – “Monkeydog”

Region within the red boundary is the object region

Page 35: Spatiotemporal Graphs for Object Segmentation, Human Pose

* Average per-frame pixel error rate. The smaller, the better.

SegTrack: Quantitative Results*

Ours [14] [13] [20] [6]

Use GTs? N N N Y Y

Birdfall 155 189 288 252 454

Cheetah 633 806 905 1142 1217

Girl 1488 1698 1785 1304 1755

Monkeydog 365 472 521 533 683

Parachute 220 221 201 235 502

Avg. 452 542 592 594 791

Page 36: Spatiotemporal Graphs for Object Segmentation, Human Pose

Summary

• STG moving object

• STG pixel-level segmentation

• Performance improved ~20%

Page 37: Spatiotemporal Graphs for Object Segmentation, Human Pose

Video Object Segmentation (VOS)

Dong Zhang, Omar Javed, and Mubarak Shah, “Video object segmentation through spatially accurate and temporally dense extraction of primary object regions”, CVPR, 2013

Page 38: Spatiotemporal Graphs for Object Segmentation, Human Pose

How about multiple videos?

Page 39: Spatiotemporal Graphs for Object Segmentation, Human Pose

Video Object Co-Segmentation (VOCS)

Dong Zhang, Omar Javed, and Mubarak Shah, “Video object co-segmentation by regulated maximum weight cliques”, ECCV, 2014

Page 40: Spatiotemporal Graphs for Object Segmentation, Human Pose

Video Object Co-Segmentation (VOCS)

• Applications • Automatic Annotation

• Unsupervised object detection & recognition

• Re-Identification Training image

Annotation

Testing image

Page 41: Spatiotemporal Graphs for Object Segmentation, Human Pose

Video Object Co-Segmentation (VOCS)

• Challenges

• Appearance variation • Multiple object classes • High complexity

Page 42: Spatiotemporal Graphs for Object Segmentation, Human Pose

Regulated Maximum Weight Cliques for Tracklets

MRF based Optimization

Input Videos

Object Co-Segmentation

Object Proposal Tracklets Generation

Framework

Page 43: Spatiotemporal Graphs for Object Segmentation, Human Pose

Object Proposal Tracklets Generation

Page 44: Spatiotemporal Graphs for Object Segmentation, Human Pose

… …

Video

Object Proposals

Page 45: Spatiotemporal Graphs for Object Segmentation, Human Pose

… …

Object Proposals

Frame 31 track 1

Track backward Track forward Frame 31 track 2

𝑺𝒔𝒊𝒎𝒊 𝒙𝒎, 𝒙𝒏 = 𝑺𝒂𝒑𝒑 𝒙𝒎, 𝒙𝒏 .𝑺𝒍𝒐𝒄 𝒙𝒎, 𝒙𝒏 .𝑺𝒔𝒉𝒂𝒑𝒆 𝒙𝒎, 𝒙𝒏

Page 46: Spatiotemporal Graphs for Object Segmentation, Human Pose

Frame 31 track 1

Frame 31 track 2

… … … … for all proposals, in all frames

Frame 61 track 2

… …

… …

… …

… …

… …

… …

Page 47: Spatiotemporal Graphs for Object Segmentation, Human Pose

Regulated Maximum Weight Cliques for Tracklets

Page 48: Spatiotemporal Graphs for Object Segmentation, Human Pose

… …

Video 1 Video 2

C1

C2

Tracklets

Tracklets

Clique 1: all chickens

Clique 2: all turtles

Each tracklet is a node Node weight 𝑾 𝑿 = (𝑺𝒐𝒃𝒋𝒆𝒄𝒕(𝒙𝒊))

𝒇𝒊=𝟏 Find Regulated Maximum Weight Cliques by

our modified Bron-Kerbosch Algorithm

Page 49: Spatiotemporal Graphs for Object Segmentation, Human Pose

Results

Page 50: Spatiotemporal Graphs for Object Segmentation, Human Pose

Chicken & Turtle

Red: first object Green: second object

Original Videos CoSegmentation Results

Page 51: Spatiotemporal Graphs for Object Segmentation, Human Pose

Elephant & Giraffe

Red: first object Green: second object

Original Videos CoSegmentation Results

Page 52: Spatiotemporal Graphs for Object Segmentation, Human Pose

Lion & Zebra

Red: first object Green: second object

Original Videos CoSegmentation

Results Original Videos CoSegmentation

Results

Page 53: Spatiotemporal Graphs for Object Segmentation, Human Pose

Quantitative Results: MOViCS Dataset

Video Set Ours1 Ours2 VCS[4] ICS[13]

Ours1: same parameters for all video sets Ours2: different parameters for each video set Numbers are the results by intersection-over-union metric, the larger, the better.

Page 54: Spatiotemporal Graphs for Object Segmentation, Human Pose

Quantitative Results: MOViCS Dataset

Video Set Ours1 Ours2 VCS[4] ICS[13]

Chicken&turtle 0.860 0.860 0.65 0.08

Ours1: same parameters for all video sets Ours2: different parameters for each video set

Numbers are the results by intersection-over-union metric, the larger, the better.

Page 55: Spatiotemporal Graphs for Object Segmentation, Human Pose

Quantitative Results: MOViCS Dataset

Video Set Ours1 Ours2 VCS[4] ICS[13]

Chicken&turtle 0.860 0.860 0.65 0.08

Zebra&lion 0.588 0.636 0.48 0.23

Giraffe&elephant 0.528 0.639 0.52 0.07

Tiger 0.336 0.336 0.30 0.30

Overall 0.578 0.617 0.49 0.17

Ours1: same parameters for all video sets Ours2: different parameters for each video set

Numbers are the results by intersection-over-union metric, the larger, the better.

Page 56: Spatiotemporal Graphs for Object Segmentation, Human Pose

Summary

• Type I STG for object segmentation

• Type II STG for object co-segmentation

• Results improved more than 20%

Page 57: Spatiotemporal Graphs for Object Segmentation, Human Pose

Video Object Co-Segmentation (VOCS)

Dong Zhang, Omar Javed, and Mubarak Shah, “Video object co-segmentation by regulated maximum weight cliques”, ECCV, 2014

Page 58: Spatiotemporal Graphs for Object Segmentation, Human Pose

What is the most important object?

Human!

Page 59: Spatiotemporal Graphs for Object Segmentation, Human Pose

Human Pose Estimation in Videos (HPEV)

Dong Zhang and Mubarak Shah, “Human Pose Estimation in Videos”, ICCV, 2015 Dong Zhang and Mubarak Shah, “A Framework for Human Pose Estimation in Videos” (submitted), PAMI, 2016

Page 60: Spatiotemporal Graphs for Object Segmentation, Human Pose

An Example for Human Segmentation

Coarse segmentation

Page 61: Spatiotemporal Graphs for Object Segmentation, Human Pose

Pose Estimation

Page 62: Spatiotemporal Graphs for Object Segmentation, Human Pose

Human Pose Estimation in Videos (HPEV)

• Applications • Action recognition

• HCI

• Surveillance

Page 63: Spatiotemporal Graphs for Object Segmentation, Human Pose

Human Pose Estimation in Videos (HPEV)

• Challenges • Huge appearance variation

• Multiple people

• Consistent estimation

Page 64: Spatiotemporal Graphs for Object Segmentation, Human Pose

Body Part Hypotheses Generation

Body Part Tracking

Input Videos

Tree-based Pose Estimation

Pose Hypotheses Generation

Framework

Page 65: Spatiotemporal Graphs for Object Segmentation, Human Pose

Frame f Frame f+1 Frame f+2

… … … …

Body part

Intra-frame Edge

Inter-frame Edge

Yellow Edges: Commonly Used Intra-

frame Edges

Blue Edges: Symmetric Intra-

frame Edges

Red Edges: Inter-frame Edges

Intra-frame Simple Cycles

Inter-frame Simple Cycles

Too Many Simple Cycles!

NP Hard!!!

Page 66: Spatiotemporal Graphs for Object Segmentation, Human Pose

Idea 1: Abstraction

Abstract Body Parts Relational Graph Real Body Parts Relational Graph

Remove intra-frame simple cycles

Page 67: Spatiotemporal Graphs for Object Segmentation, Human Pose

Idea 2: Association

Pose Relational Graph (Tracklet Graph)

Remove the inter-frame simple cycles

Page 68: Spatiotemporal Graphs for Object Segmentation, Human Pose

N-Best Hypotheses

Real Body Part Hypotheses

Abstract Body Part Hypotheses

Abstract Body Part Tracklets

Tree-based Pose

Estimation

Generate many full body pose hypotheses for each video frame

Page 69: Spatiotemporal Graphs for Object Segmentation, Human Pose

x x x x

x x

x

x x x x

x x

x

x

x

x

x x x

x

N-Best Hypotheses

Real Body Part Hypotheses

Abstract Body Part Hypotheses

Abstract Body Part Tracklets

Tree-based Pose

Estimation

x x x x

x x

x

x

x

x

x x

x

Generate real body part hypotheses for the frames

Page 70: Spatiotemporal Graphs for Object Segmentation, Human Pose

N-Best Hypotheses

Real Body Part Hypotheses

Abstract Body Part Hypotheses

Abstract Body Part Tracklets

Tree-based Pose

Estimation

x x x x

x x

x

x x x x

x

x x

x

x

x x x x

x x

x x

x

x x x x

x x x

x x x x

x

x x x

x x x x

x

x x x

x x x

x x

x

Combine Symmetric Parts

Real Body Parts Relational Graph

Abstract Body Parts Relational Graph

x x

x x x

x x

x

Page 71: Spatiotemporal Graphs for Object Segmentation, Human Pose

N-Best Hypotheses

Real Body Part Hypotheses

Abstract Body Part Hypotheses

Abstract Body Part Tracklets

Tree-based Pose

Estimation

Tracklet Hypotheses Graph

Get Best Tracklets for each part

Page 72: Spatiotemporal Graphs for Object Segmentation, Human Pose

N-Best Hypotheses

Real Body Part Hypotheses

Abstract Body Part Hypotheses

Abstract Body Part Tracklets

Tree-based Pose

Estimation

Pose Hypotheses Graph

… Select Best Poses

Page 73: Spatiotemporal Graphs for Object Segmentation, Human Pose

Qualitative Results

Page 74: Spatiotemporal Graphs for Object Segmentation, Human Pose

Outdoor Dataset (video: warmup)

Ours N-Best

Page 75: Spatiotemporal Graphs for Object Segmentation, Human Pose

Outdoor Dataset (video: bounce)

Ours N-Best

Page 76: Spatiotemporal Graphs for Object Segmentation, Human Pose

Outdoor Dataset: (video: walk2 video: kick)

Ours

N-Best

Ours

N-Best

Page 77: Spatiotemporal Graphs for Object Segmentation, Human Pose

N-Best Dataset (video: baseball)

Ours N-Best

Page 78: Spatiotemporal Graphs for Object Segmentation, Human Pose

N-Best Dataset (video: walkstraight)

Ours N-Best

Page 79: Spatiotemporal Graphs for Object Segmentation, Human Pose

HumanEva Dataset (video: Jog)

Ours N-Best

Page 80: Spatiotemporal Graphs for Object Segmentation, Human Pose

HumanEva Dataset (video: Walking)

Ours N-Best

Page 81: Spatiotemporal Graphs for Object Segmentation, Human Pose

Quantitative Results

Page 82: Spatiotemporal Graphs for Object Segmentation, Human Pose

Park et

al.

0.44 0.58 0.55 0.69 1.03 1.65 0.82

Ramakri

shna

et.al

0.39 0.58 0.48 0.48 0.88 1.42 0.71

Ours 0.19 0.22 0.35 0.37 0.41 0.61 0.36

Park et

al.

0.99 0.83 0.92 0.86 0.79 0.52 0.82

Ours 0.99 1.00 1.00 0.97 0.91 0.66 0.92

Ramakri

shna

et.al

0.99 0.86 0.95 0.96 0.86 0.52 0.86

Metric Method

Head Torso U.L. L.L. U.A. L.A. Average

PCP

Ours 0.99 1.00 1.00 0.97 0.91 0.66 0.92

Ramakrishna et.al

0.99 0.86 0.95 0.96 0.86 0.52 0.86

Park et al.

0.99 0.83 0.92 0.86 0.79 0.52 0.82

KLE

Ours 0.19 0.22 0.35 0.37 0.41 0.61 0.36

Ramakrishna et.al

0.39 0.58 0.48 0.48 0.88 1.42 0.71

Park et al.

0.44 0.58 0.55 0.69 1.03 1.65 0.82

Outdoor Dataset

PCP is a precision metric, the larger the better KLE is an error metric, the smaller the better

Metric Method Head Torso U.L. L.L. U.A. L.A. Average

PCP

KLE

Probability of a Correct Pose (PCP)

Keypoint Localization Error (KLE)

Page 83: Spatiotemporal Graphs for Object Segmentation, Human Pose

Park et

al.

0.23 0.52 0.24 0.35 1.10 1.18 0.60

Ramakris

hna et.al

0.27 0.48 0.13 0.22 1.14 1.07 0.55

Ours 0.16 0.42 0.13 0.15 0.20 0.24 0.22

Park et

al.

0.97 0.97 0.97 0.90 0.83 0.48 0.85

Ramakris

hna et.al

0.99 1.00 0.99 0.98 0.99 0.53 0.91

Ours 1.00 1.00 1.00 0.94 0.93 0.67 0.92

Metric Method Head Torso U.L. L.L. U.A. L.A. Average

PCP

Ours 1.00 1.00 1.00 0.94 0.93 0.67 0.92

Ramakrishna et.al

0.99 1.00 0.99 0.98 0.99 0.53 0.91

Park et al.

0.97 0.97 0.97 0.90 0.83 0.48 0.85

KLE

Ours 0.16 0.42 0.13 0.15 0.20 0.24 0.22

Ramakrishna et.al

0.27 0.48 0.13 0.22 1.14 1.07 0.55

Park et al.

0.23 0.52 0.24 0.35 1.10 1.18 0.60

HumanEva I Dataset

PCP is a precision metric, the larger the better KLE is an error metric, the smaller the better

Metric Method Head Torso U.L. L.L. U.A. L.A. Average

PCP

KLE

Page 84: Spatiotemporal Graphs for Object Segmentation, Human Pose

Park et

al.

0.54 0.74 0.80 1.39 2.39 4.08 1.66

Ramakris

hna et.al

0.53 0.88 0.67 1.01 1.70 2.68 1.25

Ours 0.15 0.17 0.24 0.37 0.30 0.60 0.31

Park et

al.

1.00 0.61 0.86 0.84 0.66 0.41 0.73

Ramakris

hna et.al

1.00 0.69 0.91 0.89 0.85 0.42 0.80

Ours 1.00 1.00 0.92 0.94 0.93 0.65 0.91

Metric Method Head Torso U.L. L.L. U.A. L.A. Average

PCP

Ours 1.00 1.00 0.92 0.94 0.93 0.65 0.91

Ramakrishna et.al

1.00 0.69 0.91 0.89 0.85 0.42 0.80

Park et al.

1.00 0.61 0.86 0.84 0.66 0.41 0.73

KLE

Ours 0.15 0.17 0.24 0.37 0.30 0.60 0.31

Ramakrishna et.al

0.53 0.88 0.67 1.01 1.70 2.68 1.25

Park et al.

0.54 0.74 0.80 1.39 2.39 4.08 1.66

N-Best Dataset

PCP is a precision metric, the larger the better KLE is an error metric, the smaller the better

Metric Method Head Torso U.L. L.L. U.A. L.A. Average

PCP

KLE

Page 85: Spatiotemporal Graphs for Object Segmentation, Human Pose

Summary

• HPEV can be well formulated into STGs

• STGs can be employed in multiple stages of HPEV

• Improved results

Page 86: Spatiotemporal Graphs for Object Segmentation, Human Pose

Action Localization in Videos through Context Walk

Khurram Soomro, Haroon Idrees and Mubarak Shah ICCV-2015

Page 87: Spatiotemporal Graphs for Object Segmentation, Human Pose

Action Recognition

Diving Lifting

Golf

Swing Bench Walking

Page 88: Spatiotemporal Graphs for Object Segmentation, Human Pose

Action Localization

1. Action Recognition

2. Action Detection a. Trimmed Videos

i. Spatio-Temporal

b. Untrimmed Videos i. Temporal

ii. Spatio-Temporal

Diving

Lifting

Swing Bench

Page 89: Spatiotemporal Graphs for Object Segmentation, Human Pose

Challenges: Action Localization

• Cluttered Background

• Multiple Actors/Actions

• Untrimmed Videos

Basketball Dunk

Salsa Spin

Hand Waving/Clapping/Boxing

Page 90: Spatiotemporal Graphs for Object Segmentation, Human Pose

Applications of Action Localization

•Video Search

•Action Retrieval

•Multimedia Event Recounting

•Video Understanding

Page 91: Spatiotemporal Graphs for Object Segmentation, Human Pose

Existing Solutions to Action Localization

• 1) Learn Action Detector

• 2) Exhaustively search in testing videos

• Sliding Window approach is IMPRACTICAL and WASTEFUL! • Videos:

• Untrimmed (Longer Duration)

• High Resolution

Page 92: Spatiotemporal Graphs for Object Segmentation, Human Pose

• Action Localization in Videos through Context Walk An efficient approach for action localization

Use of Context Relations that exists in videos: Action-Scene Intra-Action

Action Contours instead of bounding boxes

Motivation Context Graph Context Walk CRF Results

Page 93: Spatiotemporal Graphs for Object Segmentation, Human Pose

• Context Relations • Learn Spatio-Temporal Relations between all the Supervoxels to those within the Action (Actor

Bounding Box) • Arrows represent three-dimensional displacement vectors capturing:

Action-Scene Relations Intra-Action Relations

Motivation Context Graph Context Walk CRF Results

Page 94: Spatiotemporal Graphs for Object Segmentation, Human Pose

• Context Graph • Given supervoxels in an nth Training Video

• Construct a directed Graph Gn(Vn, En) for the video • Vn = Supervoxel nodes • En = Spatio-Temporal Relations

• Edges emanate from: All the nodes (supervoxels) Nodes (supervoxels) contained within the Actor Bounding Box

Directed Graph Action-Scene Relations Intra-Action Relations

Motivation Context Graph Context Walk CRF Results

Page 95: Spatiotemporal Graphs for Object Segmentation, Human Pose

• Context Walk • Given a Testing Video: 1. Construct an Undirected Graph G(V,E)

• Edges exist between Spatio-Temporal Neighbors 2. Randomly Select Initial node 3. Find Nearest Neighbor Supervoxel from Training Data 4. Project Displacement Vectors onto Testing Supervoxels 5. Select Next Node with Max. Probability, Repeat (Steps 3-5)

Training Video Nc

Motivation Context Graph Context Walk CRF Results

Page 96: Spatiotemporal Graphs for Object Segmentation, Human Pose

(b) Construct Spatio-temporal

Graph using all SVs

SV (v), SV Features ( )

(c) Search NNs using SV

features, then project

displacement vectors

(d) Update SVs Conditional

Distribution using all NNs

(e) Select SV with highest

confidence

(f) Repeat for T steps

(g) Segment Action Proposals through

CRF + SVM Classification

G (V, E)

i

n

j

n uu

Ξ

τΨ

Context Walk

Proposed Framework for Context Walk

CRF + SVM

(a) Segment Video into

Supervoxels (SVs)

Page 97: Spatiotemporal Graphs for Object Segmentation, Human Pose

•UCF Sports Dataset

Annotated Actor Bounding Box Action Localization Contour

Motivation Context Graph Context Walk CRF Results

Page 98: Spatiotemporal Graphs for Object Segmentation, Human Pose

Action Localization Contour

•UCF Sports Dataset

Motivation Context Graph Context Walk CRF Results

Annotated Actor Bounding Box

Page 99: Spatiotemporal Graphs for Object Segmentation, Human Pose

• Sub-JHMDB Dataset

Motivation Context Graph Context Walk CRF Results

Action Localization Contour Annotated Actor Bounding Box

Page 100: Spatiotemporal Graphs for Object Segmentation, Human Pose

• Sub-JHMDB Dataset

Motivation Context Graph Context Walk CRF Results

Action Localization Contour Annotated Actor Bounding Box

Page 101: Spatiotemporal Graphs for Object Segmentation, Human Pose

• THUMOS’13 Dataset

Motivation Context Graph Context Walk CRF Results

Action Localization Contour Annotated Actor Bounding Box

Page 102: Spatiotemporal Graphs for Object Segmentation, Human Pose

• THUMOS’13 Dataset

Motivation Context Graph Context Walk CRF Results

Action Localization Contour Annotated Actor Bounding Box

Page 103: Spatiotemporal Graphs for Object Segmentation, Human Pose

•Quantitative Results (UCFSports)

Motivation Context Graph Context Walk CRF Results

Page 104: Spatiotemporal Graphs for Object Segmentation, Human Pose

•Quantitative Results (sub-JHMDB)

Motivation Context Graph Context Walk CRF Results

Page 105: Spatiotemporal Graphs for Object Segmentation, Human Pose

•Quantitative Results (THUMOS’13)

Motivation Context Graph Context Walk CRF Results

Page 106: Spatiotemporal Graphs for Object Segmentation, Human Pose

Summary

• Efficient and Effective approach for Action Localization

• Learn Contextual Relations in the form of relative locations between different video regions

• Use Context Walk to select supervoxel at each step and predict the Action Location

Page 107: Spatiotemporal Graphs for Object Segmentation, Human Pose

Action Localization in Videos through Context Walk

Khurram Soomro, Haroon Idrees and Mubarak Shah ICCV-2015

Page 108: Spatiotemporal Graphs for Object Segmentation, Human Pose

Conclusion

• Generic Object Segmentation in Videos • Single video (CVPR-2013)

• Multiple videos (ECCV-2014)

• Human Pose Estimation in Videos (ICCV-2015)

• Human Action Detection in Videos (ICCV-2015)

Page 109: Spatiotemporal Graphs for Object Segmentation, Human Pose

Youtube Presentations

https://www.youtube.com/user/UCFCRCV

Page 110: Spatiotemporal Graphs for Object Segmentation, Human Pose

Thank You