1
A novel method to collect “Action” video shots from the Web Fully-automatic and unsupervised Only providing “Action keywords” such as “paint+picture” at first Tag-based video selection and ST-feature-based shot selection Large-scale experiments for 100 kinds of human actions 36.6% prec@100 for 100 actions, 79.8% for the top 10 actions Automatic Construction of an Action Video Shot Database using Web Videos The University of Electro-Communications, Tokyo, Japan DO HANG NGA and KEIJI YANAI [1]Q.Yang, X.Chen and G.Wang: “We2.0 Dictionary”, Proc. of ACM International Conference on Image and Video Retrieval (CIVR), p.591-600, 2008. [2]A.Noguchi and K.Yanai: “A SURF-based Spatio-Temporal Feature for feature-fusion-based action recognition”, Proc. Of ECCV WS on Human Motion: Understanding, Modeling, Capture and Animation, 2010. [3]Y. Jing and S. Baluja: “Visualrank: Applying pagerank to large-scale image search”, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 30, No. 11, pp. 1870–1890, 2008. Experiments & Results Table 1: The conditions of the experiments and the results (Mean Precision@100 of 6 actions) Exp No. Tag- based Ranking Biased damp. vec. Visual Feature Mean prec@100 RND Randomly-selected 100 shots 14.2% TAG 23.5% 1 ST 33.7% 2 ST 41.0% 3(1) (1) ST 47.3% 3(2) (2) ST 44.8% 5 (1) Motion 31.8% 6 (1) Appear. 39.7% 7 (1) Fusion 49.5% Table 2: Prec@100 of Top 60 in 100 human actions Results (%) See 100 action shot results at http://mm.cs.uec.ac.jp/webvideo/ Overview of Proposed Method Tag-based Video Selection Feature-based Shot Selection Relevant Shots Relevant Videos Web Videos 2 steps Unsupervised Method “surf wave” shots Shot ranking Introduction & Objective Many are irrelevant / partly relevant Action Keyword surf + wave Web videos of given keyword Relevant Video Shots Unsupervised Method Objective Summary Tag-based Video Selection Construct “a tag relevance score table” by counting tag frequencies and co-frequencies Calculate tag-based “video relevance scoresfor all the 2000 videos regarding one keyword W E B Tag Relevance Score Table [1] Queryrunning+marathon Tag Relevance Score Run 0.18248175 Training 0.13321168 …….…………………… Select the top 200 videos regarding the video scores Web API Counting tag frequencies & co-frequencies How to Calculate Video Relevance Scores Feature-based Shot Selection VisualRank [3] Bias the computation Combine the 3 features = + (1) = ≤≤ < ≤ ≈ 2000, = 1000 (2) = ,= = ∶ tag relevance score of video from which shot j was extracted = + + = 1 2 , = = 1 4 Downloaded Videos Shot 1 Video Segmentation Color histogram ….. ……... Shot n Feature Extraction Bag of Features Representation Similarity Matrix Calculation VisualRank Calculation Relevant Shots (Top Ranked Shots) Histogram Intersection Spatio-Temporal Feature(ST)[2] Motion Feature (MO) Appearance Feature (AP) Fusion of the above 3 features Table 2: Prec@100 of Top 60 in 100 human actions Results (%) Action Prec Action Prec Action Prec soccer+dribble 100 shoot+football 58 shoot+football 33 fold+origami 96 tie+shoelace 57 draw+eyebrows 32 crochet+hat 95 laugh 50 fieldhockey+dribble 32 arrange+flower 94 dive+sea 49 hit+golfball 32 paint+picture 88 harvest+rice 49 lunge 32 boxing 86 ski 49 play+piano 32 comb+hair 83 iron+clothes 47 row+boat 32 parachute+jump 82 twist+crunch 47 sing 32 do+exercise 79 dance+flamenco 45 chat+friend 31 do+aerobics 78 dance+hiphop 43 clean+floor 31 do+yoga 77 eat+ramen 42 cut+onion 31 serve+volleyball 75 dance+tango 41 shave+mustache 31 surf+wave 75 play+trumpet 41 pick+lock 30 shoot+arrow 73 play+drum 40 plaster+wall 30 fix+tire 67 skate 37 blow+candle 29 basketball+dribble 64 swim+crawl 36 wash+face 29 blow-dry+hair 64 cut+hair 35 walking+street 29 ride+bicycle 62 run+marathon 35 brush+teeth 28 bowl+ball 58 count+money 33 catch+fish 28 curl+bicep 58 paint+wall 33 drive+car 28 (1) Six-action experiments under various conditions Surfing wave “surf wave” videos on Video ranking Shot segmen- tation Bunch of Videos (2) Large-scale 100-action experiments batting jumping trampoline running marathon walking street shooting football eating ramen Tag lists of 1000 videos action keyword Video relevance scores for 1000 videos Gathering no video files but only tag lists for 1000 videos = −, = 1 | : given keyword ={all the tags except in Video } Download the top 200 videos

Automatic Construction of an Action Video Shot Database

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Automatic Construction of an Action Video Shot Database

• A novel method to collect “Action” video shots from the Web • Fully-automatic and unsupervised

• Only providing “Action keywords” such as “paint+picture” at first

• Tag-based video selection and ST-feature-based shot selection

• Large-scale experiments for 100 kinds of human actions • 36.6% prec@100 for 100 actions, 79.8% for the top 10 actions

Automatic Construction of an Action Video Shot Database using Web Videos The University of Electro-Communications, Tokyo, Japan DO HANG NGA and KEIJI YANAI

[1]Q.Yang, X.Chen and G.Wang: “We2.0 Dictionary”, Proc. of ACM International Conference on Image and Video Retrieval (CIVR), p.591-600, 2008. [2]A.Noguchi and K.Yanai: “A SURF-based Spatio-Temporal Feature for feature-fusion-based action recognition”, Proc. Of ECCV WS on Human Motion: Understanding, Modeling, Capture and Animation, 2010. [3]Y. Jing and S. Baluja: “Visualrank: Applying pagerank to large-scale image search”, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 30, No. 11, pp. 1870–1890, 2008.

Experiments & Results

Table 1: The conditions of the experiments and

the results (Mean Precision@100 of 6 actions)

Exp No. Tag-

based Ranking

Biased damp.

vec.

Visual Feature

Mean prec@100

RND Randomly-selected 100 shots 14.2%

TAG ✔ - - 23.5%

1 - - ST 33.7%

2 ✔ - ST 41.0%

3(1) ✔ ✔(1) ST 47.3%

3(2) ✔ ✔(2) ST 44.8%

5 ✔ ✔(1) Motion 31.8%

6 ✔ ✔(1) Appear. 39.7%

7 ✔ ✔(1) Fusion 49.5%

Table 2: Prec@100 of Top 60 in 100 human actions Results (%)

See 100 action shot results at

http://mm.cs.uec.ac.jp/webvideo/

Overview of Proposed Method

Tag-based

Video Selection

Feature-based

Shot Selection

Relevant

Shots Relevant

Videos

Web

Videos

2 steps Unsupervised Method

“surf wave” shots Shot ranking

Introduction & Objective

Many are irrelevant / partly relevant

Action

Keyword

surf

+

wave

Web videos

of given keyword

Relevant

Video Shots

Unsupervised

Method

Objective

Summary Tag-based Video Selection

Construct “a tag relevance score table” by

counting tag frequencies and co-frequencies

Calculate tag-based “video relevance scores”

for all the 2000 videos regarding one keyword

W

E

B

Tag Relevance

Score Table [1]

Query:running+marathon

Tag Relevance Score

Run 0.18248175

Training 0.13321168

…….……………………

Select the top

200 videos regarding the

video scores

Web

API

Counting tag frequencies &

co-frequencies

How to Calculate Video Relevance Scores

Feature-based Shot Selection

VisualRank [3]

Bias the computation

Combine the 3 features

𝒓 = 𝒅𝑺∗𝒓 + 𝟏 − 𝒅 𝒑

(1) 𝒑𝒊 =

𝟏

𝒎 𝟏 ≤ 𝒊 ≤ 𝒎

𝟎 𝒎 < 𝒊 ≤ 𝒏

𝑛 ≈ 2000, 𝑚 = 1000

(2) 𝒑𝒊 =

𝑺𝒄 𝒋

𝑪 , 𝑪 = 𝑺𝒄 𝒋

𝒏𝒋=𝟏

𝑆𝑐 𝑗 ∶ tag relevance score of video

from which shot j was extracted

𝑺𝐀𝐋𝐋∗ = 𝒘𝑺𝑻𝑺𝑺𝑻

∗ + 𝒘𝑴𝑶𝑺𝑴𝑶∗ + 𝒘𝑨𝑷𝑺𝑨𝑷

𝑤𝑆𝑇 =1

2, 𝑤𝑀𝑂 = 𝑤𝐴𝑃 =

1

4

Downloaded Videos

Shot 1

Video Segmentation

Color histogram

….. ……... Shot n

Feature Extraction

Bag of Features

Representation

Similarity Matrix

Calculation VisualRank

Calculation

Relevant Shots (Top Ranked Shots)

Histogram Intersection

① Spatio-Temporal Feature(ST)[2]

② Motion Feature (MO)

③ Appearance Feature (AP)

④ Fusion of the above 3 features

Table 2: Prec@100 of Top 60 in 100 human actions Results (%)

Action Prec Action Prec Action Prec

soccer+dribble 100 shoot+football 58 shoot+football 33

fold+origami 96 tie+shoelace 57 draw+eyebrows 32

crochet+hat 95 laugh 50 fieldhockey+dribble 32

arrange+flower 94 dive+sea 49 hit+golfball 32

paint+picture 88 harvest+rice 49 lunge 32

boxing 86 ski 49 play+piano 32

comb+hair 83 iron+clothes 47 row+boat 32

parachute+jump 82 twist+crunch 47 sing 32

do+exercise 79 dance+flamenco 45 chat+friend 31

do+aerobics 78 dance+hiphop 43 clean+floor 31

do+yoga 77 eat+ramen 42 cut+onion 31

serve+volleyball 75 dance+tango 41 shave+mustache 31

surf+wave 75 play+trumpet 41 pick+lock 30

shoot+arrow 73 play+drum 40 plaster+wall 30

fix+tire 67 skate 37 blow+candle 29

basketball+dribble 64 swim+crawl 36 wash+face 29

blow-dry+hair 64 cut+hair 35 walking+street 29

ride+bicycle 62 run+marathon 35 brush+teeth 28

bowl+ball 58 count+money 33 catch+fish 28

curl+bicep 58 paint+wall 33 drive+car 28

(1) Six-action experiments under various conditions

Surfing

wave

“surf wave” videos on Video ranking

Shot segmen-

tation

Bunch of

Videos

(2) Large-scale 100-action experiments

batting

jumping trampoline

running marathon

walking street

shooting football

eating ramen

Tag lists

of 1000

videos

action keyword

Video relevance

scores for 1000 videos

Gathering no video files but

only tag lists for 1000 videos

𝑃 𝑏 𝑎 =𝑐𝑜−𝑓𝑟𝑒𝑞 𝑎, 𝑏

𝑓𝑟𝑒𝑞 𝑏

𝑆 𝑉 𝑡 =1

𝑛 𝑙𝑜𝑔𝑃 𝑡𝑖|𝑡

𝑡𝑖∈𝑇𝑣

𝑡: given keyword 𝑇𝑣 ={all the tags except 𝑡 in Video 𝑉}

Download the

top 200 videos