Upload
zukun
View
221
Download
0
Embed Size (px)
Citation preview
8/2/2019 CVPR2010: grouplet: a structured image representation for recognizing human and object interactions
http://slidepdf.com/reader/full/cvpr2010-grouplet-a-structured-image-representation-for-recognizing-human 1/37
Click to edit Master subtitle style
3/22/12
Grouplet: A Structured Image
Representation for RecognizingHuman and Object Interactions
Bangpeng Yao and Li Fei-Fei
Computer Science Department, StanfordUniversity
bangpeng,[email protected]
11
8/2/2019 CVPR2010: grouplet: a structured image representation for recognizing human and object interactions
http://slidepdf.com/reader/full/cvpr2010-grouplet-a-structured-image-representation-for-recognizing-human 2/37
3/22/12 22
Human-Object Interaction
Playing saxophoneHuman SaxophoneNot playing saxophone
8/2/2019 CVPR2010: grouplet: a structured image representation for recognizing human and object interactions
http://slidepdf.com/reader/full/cvpr2010-grouplet-a-structured-image-representation-for-recognizing-human 3/37
3/22/12
Robots interactwith objects
Automatic sportscommentary
“Kobe is dunking the ball.”
Medical care
33
Human-Object Interaction
8/2/2019 CVPR2010: grouplet: a structured image representation for recognizing human and object interactions
http://slidepdf.com/reader/full/cvpr2010-grouplet-a-structured-image-representation-for-recognizing-human 4/37
3/22/12
Background: Human-Object Interaction
• Schneiderman & Kanade, 2000• Viola & Jones, 2001• Huang et al, 2007• Papageorgiou & Poggio, 2000•
Wu & Nevatia, 2005• Dalal & Triggs, 2005• Mikolajczyk et al, 2005• Leibe et al, 2005• Bourdev & Malik, 2009• Felzenszwalb & Huttenlocher, 2005• Ren et al, 2005• Ramanan, 2006•
Ferrari et al, 2008• Yang & Mori, 2008• Andriluka et al, 2009• Eichner & Ferrari, 2009
• Lowe, 1999• Belongie et al, 2002• Fergus et al, 2003
• Fei-Fei et al, 2004• Berg & Malik, 2005• Felzenszwalb et al, 2005• Grauman & Darrell, 2005• Sivic et al, 2005• Lazebnik et al, 2006• Zhang et al, 2006• Savarese et al, 2007
• Lampert et al, 2008• Desai et al, 2009• Gehler & Nowozin, 2009
• Murphy et al, 2003• Hoiem et al, 2006•
Shotton et al, 2006
• Rabinovich et al, 2007• Heitz & Koller, 2008•
Divvala et al, 2009
• Gupta et al, 2009
44
context
vs.
L
To bedone
• Yao & Fei-Fei, 2010a• Yao & Fei-Fei, 2010b
8/2/2019 CVPR2010: grouplet: a structured image representation for recognizing human and object interactions
http://slidepdf.com/reader/full/cvpr2010-grouplet-a-structured-image-representation-for-recognizing-human 5/37
3/22/12
Background: Human-Object Interaction
• Schneiderman & Kanade, 2000• Viola & Jones, 2001• Huang et al, 2007• Papageorgiou & Poggio, 2000•
Wu & Nevatia, 2005• Dalal & Triggs, 2005• Mikolajczyk et al, 2005• Leibe et al, 2005• Bourdev & Malik, 2009• Felzenszwalb & Huttenlocher, 2005• Ren et al, 2005• Ramanan, 2006•
Ferrari et al, 2008• Yang & Mori, 2008• Andriluka et al, 2009• Eichner & Ferrari, 2009
• Lowe, 1999• Belongie et al, 2002• Fergus et al, 2003
• Fei-Fei et al, 2004• Berg & Malik, 2005• Felzenszwalb et al, 2005• Grauman & Darrell, 2005• Sivic et al, 2005• Lazebnik et al, 2006• Zhang et al, 2006• Savarese et al, 2007
• Lampert et al, 2008• Desai et al, 2009• Gehler & Nowozin, 2009
• Murphy et al, 2003• Hoiem et al, 2006•
Shotton et al, 2006
• Rabinovich et al, 2007• Heitz & Koller, 2008•
Divvala et al, 2009
• Gupta et al, 2009
55
context
vs.
L
To bedone
• Yao & Fei-Fei, 2010a• Yao & Fei-Fei, 2010b
8/2/2019 CVPR2010: grouplet: a structured image representation for recognizing human and object interactions
http://slidepdf.com/reader/full/cvpr2010-grouplet-a-structured-image-representation-for-recognizing-human 6/37
3/22/12
• Intuition of Grouplet Representation
• Grouplet Feature Representation
• Using Grouplet for Recognition
• Dataset & Experiments
• Conclusion
Outline
66
8/2/2019 CVPR2010: grouplet: a structured image representation for recognizing human and object interactions
http://slidepdf.com/reader/full/cvpr2010-grouplet-a-structured-image-representation-for-recognizing-human 7/37
3/22/12
• Intuition of Grouplet Representation
• Grouplet Feature Representation
• Using Grouplet for Recognition
• Dataset & Experiments
• Conclusion
Outline
77
8/2/2019 CVPR2010: grouplet: a structured image representation for recognizing human and object interactions
http://slidepdf.com/reader/full/cvpr2010-grouplet-a-structured-image-representation-for-recognizing-human 8/37
3/22/12 88
Recognizing Human-Object Interaction is Challenging
Differentbackground
Same object (saxophone),different interactions
Different pose(or viewpoint)
Differentlighting
Different instrument,similar pose
Reference image:playing saxophone
8/2/2019 CVPR2010: grouplet: a structured image representation for recognizing human and object interactions
http://slidepdf.com/reader/full/cvpr2010-grouplet-a-structured-image-representation-for-recognizing-human 9/37
3/22/12 99
Grouplet: our intuition
Bag-of-words Spatial pyramid Part-based
• Thomas & Malik,2001• Csurka et al, 2004• Fei-Fei & Perona,
2005• Sivic et al, 2005
• Grauman &Darrell, 2005• Lazebnik et al,2006
• Weber et al, 2000• Fergus et al, 2003• Leibe et al, 2004• Felzenszwalb et al,
2005• Bourdev & Malik,
Grouplet
Representation:
0 20 40 6 0 80 100 1 20 140 1 60 1 80 2 000
5
10
15
20
25
0 20 40 6 0 80 100 1 20 140 1 60 1 80 2 000
5
10
15
20
25
8/2/2019 CVPR2010: grouplet: a structured image representation for recognizing human and object interactions
http://slidepdf.com/reader/full/cvpr2010-grouplet-a-structured-image-representation-for-recognizing-human 10/37
3/22/12 1010
Grouplet: our intuition
Grouplet
Representation:• Part-based
configuration
• Co-occurrence
•
Discriminative• Dense
Capture the subtle difference in human-object interactions.
8/2/2019 CVPR2010: grouplet: a structured image representation for recognizing human and object interactions
http://slidepdf.com/reader/full/cvpr2010-grouplet-a-structured-image-representation-for-recognizing-human 11/37
3/22/12
• Intuition of Grouplet Representation
• Grouplet Feature Representation
• Using Grouplet for Recognition
• Dataset & Experiments
• Conclusion
Outline
1111
8/2/2019 CVPR2010: grouplet: a structured image representation for recognizing human and object interactions
http://slidepdf.com/reader/full/cvpr2010-grouplet-a-structured-image-representation-for-recognizing-human 12/37
3/22/12 1212
Grouplet representation (e.g. 2-Grouplet)
I
2 2 2 2: , , A x σ λ
r
1 1 1 1: , , A x σ λ r
Λ1 2
, = λλ P
• I : Image.• P : Reference point in the image.• Λ: Grouplet.• λ i: Feature unit.
- Ai: Visual codeword;- xi: Image location;- σi: Variance of spatial distribution.
Notations
Visual codewords Gaussian distribution
8/2/2019 CVPR2010: grouplet: a structured image representation for recognizing human and object interactions
http://slidepdf.com/reader/full/cvpr2010-grouplet-a-structured-image-representation-for-recognizing-human 13/37
3/22/12 1313
Grouplet representation (e.g. 2-Grouplet)
• I : Image.• P : Reference point in the image.• Λ: Grouplet.• λ i: Feature unit.
• ν(Λ, I ): Matching score of Λ and I .• ν(λ i, I ): Matching score of λ i and I .
- Ai: Visual codeword;- xi: Image location;- σi: Variance of spatial distribution.
Notations
( ) ( , ) min ,i
iv I v I Λ = λ
Matching scorebetween Λ and I
Matching scorebetween λ i and I
Visual codewords Gaussian distribution
I
2 2 2 2: , , A x σ λ
r
1 1 1 1: , , A x σ λ r
Λ1 2
, = λλ P
8/2/2019 CVPR2010: grouplet: a structured image representation for recognizing human and object interactions
http://slidepdf.com/reader/full/cvpr2010-grouplet-a-structured-image-representation-for-recognizing-human 14/37
3/22/12
2 2 2 2: , , A x σ λ
r
Λ1 2
, = λλ
1414
Grouplet representation (e.g. 2-Grouplet)
• I : Image.• P : Reference point in the image.• Λ: Grouplet.• λ i: Feature unit.
• ν(Λ, I ): Matching score of Λ and I .• ν(λ i, I ): Matching score of λ i and I .• For an image patch:
• Ω( x): Image neighborhood of x.
- Ai: Visual codeword;- xi: Image location;- σi: Variance of spatial distribution.
Notations
- a′: Its visual appearance;- x′: Its image location.
( ) ( , ) min ,i
iv I v I Λ = λ [ ]
( )
min p( | ) ( | , )i
i i ii
x x
A a N x x σ
′∈Ω
′ ′= ⋅
∑Codeword
assignment scoreGaussian
density value
Visual codewords Gaussian distribution
Matching scorebetween Λ and I
Matching scorebetween λ i and I
I
1 1 1 1: , , A x σ λ r
P
8/2/2019 CVPR2010: grouplet: a structured image representation for recognizing human and object interactions
http://slidepdf.com/reader/full/cvpr2010-grouplet-a-structured-image-representation-for-recognizing-human 15/37
3/22/12
( )
min max p( | ) ( | , ) j
i i
j
i i i ii j
x x
A a N x x σ
′∈Ω +∆
′ ′ = ⋅ + ∆
∑( ) ( , ) min ,i
iv I v I Λ = λ
1515
Grouplet representation (e.g. 2-Grouplet)
• I : Image.• P : Reference point in the image.• Λ: Grouplet.• λ i: Feature unit.
• ν(Λ, I ): Matching score of Λ and I .• ν(λ i, I ): Matching score of λ i and I .• For an image patch:
• Ω( x): Image neighborhood of x.• Δ: A small shift of the location.
- Ai: Visual codeword;- xi: Image location;- σi: Variance of spatial distribution.
Notations
Matching scorebetween Λ and I
Codewordassignment score
Gaussiandensity value
- a′: Its visual appearance;- x′: Its image location.
[ ]( )
min p( | ) ( | , )i
i i ii
x x
A a N x x σ
′∈Ω
′ ′= ⋅
∑
Visual codewords Gaussian distribution
Matching scorebetween λ i and I
Codewordassignment score
Gaussiandensity value
I
2 2 2 2: , , A x σ λ
r
1 1 1 1: , , A x σ λ r
Λ1 2
, = λλ P
8/2/2019 CVPR2010: grouplet: a structured image representation for recognizing human and object interactions
http://slidepdf.com/reader/full/cvpr2010-grouplet-a-structured-image-representation-for-recognizing-human 16/37
3/22/12
matching score: 0.6
1616
Grouplet representation
• Part-based configuration
• Co-occurrence
• Discriminative
matching score: 0.4 matching score: 0.0 matching score: 0.1
Playing saxophone Other interactions
I
2 2 2 2: , , A x σ λ
r
1 1 1 1: , , A x σ λ r
Λ1 2
, = λλ P
8/2/2019 CVPR2010: grouplet: a structured image representation for recognizing human and object interactions
http://slidepdf.com/reader/full/cvpr2010-grouplet-a-structured-image-representation-for-recognizing-human 17/37
3/22/12 1717
• Part-based configuration
• Co-occurrence
• Discriminative
• Dense
Grouplet representation
All possibleCodewords
Densely sampleimage locations
Many possiblespatial distributions
L
L1-grouplet 2-grouplet 3-grouplet
All possible combinations of feature units
I
2 2 2 2: , , A x σ λ
r
1 1 1 1: , , A x σ λ r
Λ1 2
, = λλ P
8/2/2019 CVPR2010: grouplet: a structured image representation for recognizing human and object interactions
http://slidepdf.com/reader/full/cvpr2010-grouplet-a-structured-image-representation-for-recognizing-human 18/37
3/22/12
• Intuition of Grouplet Representation
• Grouplet Feature Representation
• Using Grouplet for Recognition
• Dataset & Experiments
• Conclusion
Outline
1818
8/2/2019 CVPR2010: grouplet: a structured image representation for recognizing human and object interactions
http://slidepdf.com/reader/full/cvpr2010-grouplet-a-structured-image-representation-for-recognizing-human 19/37
3/22/12
A “Space” of Grouplets
1919
8/2/2019 CVPR2010: grouplet: a structured image representation for recognizing human and object interactions
http://slidepdf.com/reader/full/cvpr2010-grouplet-a-structured-image-representation-for-recognizing-human 20/37
3/22/12 2020
Playingviolin
Other interactions
A “Space” of Grouplets
8/2/2019 CVPR2010: grouplet: a structured image representation for recognizing human and object interactions
http://slidepdf.com/reader/full/cvpr2010-grouplet-a-structured-image-representation-for-recognizing-human 21/37
3/22/12 2121
Playingviolin
Other interactions
Playingsaxophone
Other interactions
A “Space” of Grouplets
8/2/2019 CVPR2010: grouplet: a structured image representation for recognizing human and object interactions
http://slidepdf.com/reader/full/cvpr2010-grouplet-a-structured-image-representation-for-recognizing-human 22/37
3/22/12 2222
Playingviolin
Other interactions
Playingsaxophone
Other interactions
On background
Shared by differentinteractions
A “Space” of Grouplets
8/2/2019 CVPR2010: grouplet: a structured image representation for recognizing human and object interactions
http://slidepdf.com/reader/full/cvpr2010-grouplet-a-structured-image-representation-for-recognizing-human 23/37
3/22/12
Shared by differentinteractions
On background
2323 23
We only need discriminative Grouplets
Large ν(Λ, I ) Small ν(Λ, I ) Large ν(Λ, I ) Small ν(Λ, I )
Playingviolin
Other interactions
Playingsaxophone
Other interactions
Number of Grouplets: 2 N
very large
space
Number of feature units: N .N is large (192200)
8/2/2019 CVPR2010: grouplet: a structured image representation for recognizing human and object interactions
http://slidepdf.com/reader/full/cvpr2010-grouplet-a-structured-image-representation-for-recognizing-human 24/37
3/22/12
L
2424
Obtaining discriminative grouplets for a class
Obtain groupletswith large ν(Λ, I ) on the class.
Remove groupletswith large ν(Λ, I ) from other classes.
L
L
M
Apriori
Mining
[Agrawal & Srikant, 1994]
Selected 1-grouplets
Candidate 2-grouplets
L
Number of Grouplets: 2 N
very large
space
Number of feature units: N .N is large (192200)
Mine 1000~2000 grouplets, only needto evaluate (2~100)× N grouplets
8/2/2019 CVPR2010: grouplet: a structured image representation for recognizing human and object interactions
http://slidepdf.com/reader/full/cvpr2010-grouplet-a-structured-image-representation-for-recognizing-human 25/37
3/22/12 2525
Using Grouplets for Classification
( ) ( )1, , , , N I I ν ν
Λ Λ L
NDiscriminativegrouplets
[ ]1, ,
N Λ ΛL
SVM
I
8/2/2019 CVPR2010: grouplet: a structured image representation for recognizing human and object interactions
http://slidepdf.com/reader/full/cvpr2010-grouplet-a-structured-image-representation-for-recognizing-human 26/37
3/22/12
• Intuition of Grouplet Representation
• Grouplet Feature Representation
• Using Grouplet for Recognition
• Dataset & Experiments
• Conclusion
Outline
2626
8/2/2019 CVPR2010: grouplet: a structured image representation for recognizing human and object interactions
http://slidepdf.com/reader/full/cvpr2010-grouplet-a-structured-image-representation-for-recognizing-human 27/37
3/22/12
People-Playing-Musical-Instruments (PPMI) Dataset
http://vision.stanford.edu/resources_links.html
PPMI+
PPMI-
2727
(172)
(164)
(191)
(148)
(177)
(133)
(179)
(149)
(200)
(188)
(198)
(169)
(185)
(167)
# Image:
# Image:
Original image Normalized image
(200 images each interaction)
8/2/2019 CVPR2010: grouplet: a structured image representation for recognizing human and object interactions
http://slidepdf.com/reader/full/cvpr2010-grouplet-a-structured-image-representation-for-recognizing-human 28/37
3/22/12
Recognition Tasks on People-Playing-Musical-Instruments (PPMI) Dataset
2828
Classification Detection
Playingsaxophone
Playingbassoon
Playingsaxophone
Playing
French horn
Playingviolin
vs.
Playingviolin
Not playingviolin
vs.
Playing different instruments
Playing vs. Not playing
For each interaction, 100 trainingand 100 testing images.
8/2/2019 CVPR2010: grouplet: a structured image representation for recognizing human and object interactions
http://slidepdf.com/reader/full/cvpr2010-grouplet-a-structured-image-representation-for-recognizing-human 29/37
3/22/12
Classification: Playing Different Instruments
• 7-class classification on PPMI+ images
SPM: [Lazebnik et al, 2006]DPM: [Felzenszwalb et al, 2008]Constellation: [Fergus et al, 2003]
[Niebles & Fei-Fei, 2007]
59.9%
54.9%
39.0%37.7%
Grouplet
+SVMSPMDPM
Constel
-lationBoW
65.7%
C l a s s i f i c a t i o n a c c u r a c y
0.7
0.6
0.5
0.4
2929
1 2 3 4 5 60
200
400
600
800
1000
1200
Grouplet size
N o . o f m i n e d
G r o u p l e t s
8/2/2019 CVPR2010: grouplet: a structured image representation for recognizing human and object interactions
http://slidepdf.com/reader/full/cvpr2010-grouplet-a-structured-image-representation-for-recognizing-human 30/37
3/22/12
A v e r a g e
P
P M I + i m a g e s
Classifying Playing vs. Not playing
3030
• Seven 2-class classification problem; PPMI+ vs. PPMI- for each instrument.
A v e r a g e P P M I -
i m a g e s
Bassoon Erhu Flute French horn Guitar Saxophone Violin
A c c u r a c y
Grouplet+SVMDPM DPMBoW SPM
Bassoon Erhu Flute French horn Saxophone Violin
8/2/2019 CVPR2010: grouplet: a structured image representation for recognizing human and object interactions
http://slidepdf.com/reader/full/cvpr2010-grouplet-a-structured-image-representation-for-recognizing-human 31/37
3/22/12
A v e r a g e
P
P M I + i m a g e s
Classifying Playing vs. Not playing
3131
• Seven 2-class classification problem; PPMI+ vs. PPMI- for each instrument.
A v e r a g e P P M I -
i m a g e s
Bassoon Erhu Flute French horn Guitar Saxophone Violin
A c c u r a c y
Grouplet+SVMDPM DPMBoW SPM
Guitar
8/2/2019 CVPR2010: grouplet: a structured image representation for recognizing human and object interactions
http://slidepdf.com/reader/full/cvpr2010-grouplet-a-structured-image-representation-for-recognizing-human 32/37
3/22/12
Detecting people playing musical instruments
3232
• Face detection with a low threshold;
• Crop and normalize image regions;
• 8-class classification
Procedure:
L L
Playing saxophone No playing No playing
-
7 classes of playing instruments;- Another class of not playing any instrument.
8/2/2019 CVPR2010: grouplet: a structured image representation for recognizing human and object interactions
http://slidepdf.com/reader/full/cvpr2010-grouplet-a-structured-image-representation-for-recognizing-human 33/37
3/22/12 3333
Detecting people playing musical instruments
Playingsaxophone
Playingbassoon
PlayingFrench horn
Playingsaxophone
PlayingFrench horn
Area under the precision-recall curve:• Out method: 45.7%;• Spatial pyramid: 37.3%.
8/2/2019 CVPR2010: grouplet: a structured image representation for recognizing human and object interactions
http://slidepdf.com/reader/full/cvpr2010-grouplet-a-structured-image-representation-for-recognizing-human 34/37
3/22/12 3434
Detecting people playing musical instruments
PlayingFrench horn
False detection Missed detection
Area under the precision-recall curve:• Out method: 45.7%;• Spatial pyramid: 37.3%.
8/2/2019 CVPR2010: grouplet: a structured image representation for recognizing human and object interactions
http://slidepdf.com/reader/full/cvpr2010-grouplet-a-structured-image-representation-for-recognizing-human 35/37
3/22/12 3535
Examples of Mined Grouplets
Playing
bassoon:
Playing
saxophone:
Playingviolin:
Playingguitar:
8/2/2019 CVPR2010: grouplet: a structured image representation for recognizing human and object interactions
http://slidepdf.com/reader/full/cvpr2010-grouplet-a-structured-image-representation-for-recognizing-human 36/37
3/22/12 3636
Conclusion
• Holistic image-based classification
Vs.
[B. Yao and L. Fei-Fei. “Modeling mutualcontext of object and human pose in human-object interaction activities.” CVPR 2010.]
[B. Yao and L. Fei-Fei. “Grouplet: A structuredimage representation for recognizing humanand object interactions.” CVPR 2010.]
• Detailed understanding and reasoning
Pose estimation & object detection
The Next Talk
Playingsaxophone
Playingbassoon
Playingsaxophone
8/2/2019 CVPR2010: grouplet: a structured image representation for recognizing human and object interactions
http://slidepdf.com/reader/full/cvpr2010-grouplet-a-structured-image-representation-for-recognizing-human 37/37
3/22/12
Thanks toJuan Carlos Niebles, Jia Deng, Jia Li, Hao Su,
Silvio Savarese, and anonymous reviewers.
And You
3737