26
Learning realistic human actions from movies by Ivan Laptev, Marcin Marszalek, Cordelia Schmid, and Benjamin Rozenfeld PRESENTATION BY KERRY SEITZ 1

Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin

  • Upload
    hadieu

  • View
    221

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin

Learning realistic human actions from movies

by Ivan Laptev, Marcin Marszalek, Cordelia Schmid, and Benjamin Rozenfeld

PRESENTATION BY KERRY SEITZ

1

Page 2: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin

The Problem

Recognize natural human actions

Realistic videos

Getting out of a car

Answering a phone

Performing CPRKissing

2[LAPTEV ET AL. 2008]

Page 3: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin

Challenges

Lack of datasets

Variations in:◦ Expression, posture, motion, and clothing

◦ Camera motion and perspective

◦ Illumination

◦ Occlusion and surroundings

3[LAPTEV ET AL. 2008]

Page 4: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin

Automatic Annotation of Human Actions

Use movie scripts

Problems◦ No time information

◦ Script and movie don’t always match

◦ Variations in phrasing

4

Page 5: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin

Script-to-Video Alignment

5[LAPTEV ET AL. 2008]

Page 6: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin

Script-to-Video Alignment

Alignment score (a) for each scene◦ Script-subtitle misalignment

◦ a = (# matched words) / (# all words)

Types of errors when a=1◦ Misaligned in time (10%)

◦ Outside the field of view (10%)

◦ Missing in the video (10%)

6[LAPTEV ET AL. 2008]

Page 7: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin

Text Retrieval of Human Actions

Phrasing variations◦ “Will gets out of the Chevrolet.”

◦ “A black car pulls up. Two army officers get out.”

◦ “Erin exits her new truck.”

False positives◦ “About to sit down, he freezes.”

Keyword search is insufficient!

7

Page 8: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin

Text Retrieval of Human Actions

Train classifier for each action (bag of features model)◦ Words

◦ Adjacent pairs of words

◦ Pairs of words within a window of N words (2 ≤ N ≤ 8)

Regularized perceptron◦ Equivalent to SVM

◦ Trained on manually labeled scene descriptions

◦ Tuned using validation set

8

Page 9: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin

Text Retrieval of Human Actions

9[LAPTEV ET AL. 2008]

Page 10: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin

The Datasets

Manual and Test Sets◦ Manually annotated scripts

◦ Manually selected visually-correct action samples

Automatic Set◦ Automatically annotated scripts

◦ Automatically selected action samples

◦ a > 0.5

◦ Length < 1,000 frames

10[LAPTEV ET AL. 2008]

Page 11: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin

KTH Dataset

11[LAPTEV ET AL. 2008]

Page 12: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin

Action Recognition

Sparse space-time features◦ Compact representation

◦ Tolerant to background clutter, occlusions, and scale changes

Interest point detection – Harris operator◦ Multiple levels of spatio-temporal scales

12

Page 13: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin

Interest Point Detection

13[LAPTEV ET AL. 2008]

Page 14: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin

Features at the Interest points

Histogram of descriptors of space-time volumes◦ Volumes divided into (nx, ny, nt) grid of cuboids

◦ Compute histogram of oriented gradients (HoG)

◦ Compute histogram of optic flow (HoF)

14[IKIZLER ET AL. 2008]

Page 15: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin

Spatio-Temporal Bag-of-Features

k-means with 4,000 clusters

Different grid sizes

Classify with non-linear SVM

15[LAPTEV ET AL. 2008]

Page 16: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin

Evaluation ofSpatio-Temporal Grids

16[LAPTEV ET AL. 2008]

Page 17: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin

Evaluation ofSpatio-Temporal Grids

17[LAPTEV ET AL. 2008]

Page 18: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin

Comparison to theState-of-the-Art

KTH Dataset Divided into:◦ Training/validation set (8+8 people)

◦ Test set (9 people)

Use best performing channel combination

18[LAPTEV ET AL. 2008]

Page 19: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin

Confusion Matrix

19[LAPTEV ET AL. 2008]

Page 20: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin

Noise in Training Data

20[LAPTEV ET AL. 2008]

Page 21: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin

Results for Real-World Videos

21[LAPTEV ET AL. 2008]

Page 22: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin

Examples

22[LAPTEV ET AL. 2008]

Page 23: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin

Summary

Automatic annotation using movie scripts

Action recognition performs better than state-of-the-art

System tolerant to errors in training data

23

Page 24: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin

Future Work

Improve script-to-video alignment

Improve tolerance of classifier◦ Iterative learning

Experiment with other space-time low-level features

24

Page 25: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin

Questions?

25[LAPTEV ET AL. 2008]

Page 26: Learning realistic human actions from moviesweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/KerrySeitz1.pdf · Learning realistic human actions from movies by Ivan Laptev, Marcin

References

Learning Realistic Human Actions from Movies. I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld. CVPR 2008.

Human Action Recognition with Line and Flow Histograms. N. Ikizler, G. Cinbis, and P. Duygulu. ICPR 2008.

26