Upload
colin-griffin
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
Context-based Visual Concept Context-based Visual Concept Detection Using Domain Adaptive Detection Using Domain Adaptive
Semantic DiffusionSemantic Diffusion
Yu-Gang JiangDigital Video and Multimedia Lab, Columbia University
VIREO Research Group, City University of Hong Kong
1
Consider this question first
Can you find video clips containing fire?
Can you find video clips containing fire?
… …
…
time
Vid
eo 3
Vid
eo 2
Vid
eo 1
2
http://jaeger.earthsci.unimelb.edu.au/Images/Topographic/Whole_Earth/Earth_50.jpg
12,7
56.3
2 ki
lom
eter
s
3
• Texts (e.g., tags) are noisy even with human doing the annotation.• Need to look beyond texts!
The commercial search engines…
4
• There has been a great deal of interests in developing content-based approaches for video/image annotation.
– Everingham, Zisserman, Williams and Van Gool, Oxford Tech. Report, 2006.– Fei-Fei, Fergus and Perona, CVPR workshop 2004.– Grauman and Darrell, ICCV 2005.– Lazebnik, Schmid, and Ponce, CVPR 2006.– Jiang, Ngo, Yang, CIVR 2007.
Recent developments
sky, mountain, sky, mountain, tree, bridge, tree, bridge, river…river…
5
Recent developments
6
Keypoint extraction Visual word vocabulary
BoW histogram
Keypoint feature space
...
Clustering
............
v2 v3v1 v4
...
• Bag of visual words
J. Sivic et al, ICCV 2003
Our feature representation framework
7
Keypoint extraction
Visual word vocabulary 1
SIF
T fe
atur
e sp
ace
......... .........
Visual word vocabulary 2
.........
Visual word vocabulary 3
DoG Hessian Affine MSER
BoW histograms Using Soft-Weighting
... + + +
+ + ++ + ++ + +
+
- - -
- - -
- - -
- - - - - -
- - - - - - - - -
SVM classifiers
... ......
+ + +
+ + ++ + ++ + +
+
- - -
- - -
- - -
- - - - - -
- - - - - - - - -
+ + +
+ + ++ + +
- - -
- - -
- - -
- - - - - -
- - - - - - - - -
Vocabulary Generation BoW Representation
Chang TRECVID 2008; Jiang, Yang, Ngo & Hauptmann, IEEE TMM, to appear
Recent results on TRECVID
8
Limitations
• Most existing methods aim at the assignment of concept labels individually– but concepts do not occur in isolation!
• Domain change between training and testing data was not considered
military personnel
smoke
explosion_fire
road outdoor
vehicle
building
9
Documentary Videos
Broadcast News Videos
Method overview
• Domain adaptive semantic diffusion (DASD)
10
road vehicle
0.05
0.19
0.80
0.46
0.13
0.01
0.12
0.91
0.18
0.05
Water
0.11
0.58
0.10
0.13
0.02
Jiang, Wang, Chang, Ngo, ICCV 2009
Method overview (cont.)
• Domain adaptive semantic diffusion (DASD)– Semantic graph
• Nodes are concepts• Edges represent
concept correlation
– Graph diffusion• Smooth concept
detection scores w.r.t the concept correlation
vehicle
road
Water sky
0.10.20.80.50.1…0.4
0.10.60.10.10.0…0.8
0.00.40.50.20.8…0.7
0.00.10.90.20.1…0.3
11
Formulation
• Energy function
Detection score of concept ci on test samples Detection score of concept ci on test samples
Concept affinity matrixConcept affinity matrix
12
Formulation (cont.)
• Gradually smooth the function makes the detection scores in accordance with the concept relationships
Detection score smoothing processDetection score smoothing process
13
Formulation (cont.)
• Domain changes…
14
-- concept affinity matrix
Broadcast News Videos Documentary Videos
0.09
0.09
VEHICLE
0.640.20
0.24
0.29
DESERT
SKY
CLOUDS
WEAPON
0.17
CAR
PARKING_LOT
Formulation (cont.)
• Graph adaptation
Graph adaptation processGraph adaptation process
15
Iteration: 8Iteration: 12
0.16
0.09VEHICLE
0.640.20
0.24
0.29
DESERT
SKY
CLOUDS
WEAPON
0.10
PARKING_LOT
CAR0.16
0.09VEHICLE
0.640.19
0.18
0.32
DESERT
SKY
CLOUDS
WEAPON
0.13
PARKING_LOT
CAR
Iteration: 0Iteration: 4
0.15
0.09VEHICLE
0.640.17
0.12
0.34
DESERT
SKY
CLOUDS
WEAPON
0.16
PARKING_LOT
CAR0.15
0.08VEHICLE
0.640.17
0.05
0.38
DESERT
SKY
CLOUDS
WEAPON
0.19
PARKING_LOT
CAR
Iteration: 16
0.15
0.08VEHICLE
0.640.16
0.00
0.42
DESERT
SKY
CLOUDS
WEAPON
0.24
PARKING_LOT
CAR0.15
0.08VEHICLE
0.640.16
0.00
0.43
DESERT
SKY
CLOUDS
WEAPON
0.27
PARKING_LOT
CAR
Iteration: 20
Graph adaptation - example
Broadcast news video domainBroadcast news video domain Documentary video domainDocumentary video domain
16
Experiments
• Datasets– TRECVID 2005, 2006, 2007
• Baseline detectors: VIREO-374
• Graph construction:– Ground-truth labels on TRECVID 2005
SPORTSSPORTS WEATHERWEATHER
OFFICEOFFICE BUILDINGBUILDING
DESERTDESERT MOUNTAINMOUNTAIN
WALKINGWALKING
PEOPLE-PEOPLE-MARCHINGMARCHING
EXPLOSION-EXPLOSION-FIREFIRE
MAPMAP
TRUCKTRUCK
CORP. LEADERCORP. LEADER
SPORTS WEATHER OFFICE
DESERTDESERT MOUNTAINMOUNTAIN WATERWATER
POLICEPOLICE MILITARYMILITARY ANIMALANIMAL TWO PEOPLETWO PEOPLE
NIGHT TIMENIGHT TIME TELEPHONETELEPHONE
STREETSTREET
CLASSROOMBUS
TRECVID 05/06 (Broadcast News Videos)TRECVID 05/06 (Broadcast News Videos) TRECVID 07 (Documentary Videos)TRECVID 07 (Documentary Videos)
17
Results
• Performance gain on TRECVID 05-07 Datasets
TRECVID- 2005 2006 2007
# of evaluated concepts 39 20 20
Baseline (MAP) 0.166 0.154 0.099
SD 11.8% 15.6% 12.1%
DASD 11.9% 17.5% 16.2%
18
SD: semantic diffusion Consistent improvement over all 3 data sets
DASD: domain adaptive semantic diffusion Graph adaptation further improves the performance
Results (cont.)
TRECVID 2006 Test Data
• Per-concept performance
19
Results (cont.)
20
TRECVID Jiang et al Aytar et al Weng et al DASD
2005 2.2% 4.0% N/A 11.9%
2006 N/A N/A 16.7% 17.5%
Comparison with the state-of-the-artsComparison with the state-of-the-arts
various baseline detectors (TRECVID-06)various baseline detectors (TRECVID-06)
Computational time
• Complexity is O(mn) –m: # concepts; n: # video shots
• Only 2 milliseconds per shot/keyframe!
21
TRECVID 05 TRECVID 06 TRECVID 07
SD 59s 84s 12s
DASD 89s 165s 28s
Single image/frame computation time
Summary
22
• A judicious approach using local features achieves very impressive results for visual annotation
• Context information is helpful !– Domain adaptive semantic diffusion
• effective for enhancing video annotation accuracy• can alleviate the effect of data domain changes• highly efficient
– Future directions include:• detector reliability: diffusion over directed graph• web data annotation: utilize contextual information to improve
the quality of tags