22
Context-based Visual Concept Context-based Visual Concept Detection Using Domain Detection Using Domain Adaptive Semantic Diffusion Adaptive Semantic Diffusion Yu-Gang Jiang Digital Video and Multimedia Lab, Columbia University VIREO Research Group, City University of Hong Kong 1

Context-based Visual Concept Detection Using Domain Adaptive Semantic Diffusion Yu-Gang Jiang Digital Video and Multimedia Lab, Columbia University VIREO

Embed Size (px)

Citation preview

Page 1: Context-based Visual Concept Detection Using Domain Adaptive Semantic Diffusion Yu-Gang Jiang Digital Video and Multimedia Lab, Columbia University VIREO

Context-based Visual Concept Context-based Visual Concept Detection Using Domain Adaptive Detection Using Domain Adaptive

Semantic DiffusionSemantic Diffusion

Yu-Gang JiangDigital Video and Multimedia Lab, Columbia University

VIREO Research Group, City University of Hong Kong

1

Page 2: Context-based Visual Concept Detection Using Domain Adaptive Semantic Diffusion Yu-Gang Jiang Digital Video and Multimedia Lab, Columbia University VIREO

Consider this question first

Can you find video clips containing fire?

Can you find video clips containing fire?

… …

time

Vid

eo 3

Vid

eo 2

Vid

eo 1

2

Page 3: Context-based Visual Concept Detection Using Domain Adaptive Semantic Diffusion Yu-Gang Jiang Digital Video and Multimedia Lab, Columbia University VIREO

http://jaeger.earthsci.unimelb.edu.au/Images/Topographic/Whole_Earth/Earth_50.jpg

12,7

56.3

2 ki

lom

eter

s

3

Page 4: Context-based Visual Concept Detection Using Domain Adaptive Semantic Diffusion Yu-Gang Jiang Digital Video and Multimedia Lab, Columbia University VIREO

• Texts (e.g., tags) are noisy even with human doing the annotation.• Need to look beyond texts!

The commercial search engines…

4

Page 5: Context-based Visual Concept Detection Using Domain Adaptive Semantic Diffusion Yu-Gang Jiang Digital Video and Multimedia Lab, Columbia University VIREO

• There has been a great deal of interests in developing content-based approaches for video/image annotation.

– Everingham, Zisserman, Williams and Van Gool, Oxford Tech. Report, 2006.– Fei-Fei, Fergus and Perona, CVPR workshop 2004.– Grauman and Darrell, ICCV 2005.– Lazebnik, Schmid, and Ponce, CVPR 2006.– Jiang, Ngo, Yang, CIVR 2007.

Recent developments

sky, mountain, sky, mountain, tree, bridge, tree, bridge, river…river…

5

Page 6: Context-based Visual Concept Detection Using Domain Adaptive Semantic Diffusion Yu-Gang Jiang Digital Video and Multimedia Lab, Columbia University VIREO

Recent developments

6

Keypoint extraction Visual word vocabulary

BoW histogram

Keypoint feature space

...

Clustering

............

v2 v3v1 v4

...

• Bag of visual words

J. Sivic et al, ICCV 2003

Page 7: Context-based Visual Concept Detection Using Domain Adaptive Semantic Diffusion Yu-Gang Jiang Digital Video and Multimedia Lab, Columbia University VIREO

Our feature representation framework

7

Keypoint extraction

Visual word vocabulary 1

SIF

T fe

atur

e sp

ace

......... .........

Visual word vocabulary 2

.........

Visual word vocabulary 3

DoG Hessian Affine MSER

BoW histograms Using Soft-Weighting

... + + +

+ + ++ + ++ + +

+

- - -

- - -

- - -

- - - - - -

- - - - - - - - -

SVM classifiers

... ......

+ + +

+ + ++ + ++ + +

+

- - -

- - -

- - -

- - - - - -

- - - - - - - - -

+ + +

+ + ++ + +

- - -

- - -

- - -

- - - - - -

- - - - - - - - -

Vocabulary Generation BoW Representation

Chang TRECVID 2008; Jiang, Yang, Ngo & Hauptmann, IEEE TMM, to appear

Page 8: Context-based Visual Concept Detection Using Domain Adaptive Semantic Diffusion Yu-Gang Jiang Digital Video and Multimedia Lab, Columbia University VIREO

Recent results on TRECVID

8

Page 9: Context-based Visual Concept Detection Using Domain Adaptive Semantic Diffusion Yu-Gang Jiang Digital Video and Multimedia Lab, Columbia University VIREO

Limitations

• Most existing methods aim at the assignment of concept labels individually– but concepts do not occur in isolation!

• Domain change between training and testing data was not considered

military personnel

smoke

explosion_fire

road outdoor

vehicle

building

9

Documentary Videos

Broadcast News Videos

Page 10: Context-based Visual Concept Detection Using Domain Adaptive Semantic Diffusion Yu-Gang Jiang Digital Video and Multimedia Lab, Columbia University VIREO

Method overview

• Domain adaptive semantic diffusion (DASD)

10

road vehicle

0.05

0.19

0.80

0.46

0.13

0.01

0.12

0.91

0.18

0.05

Water

0.11

0.58

0.10

0.13

0.02

Jiang, Wang, Chang, Ngo, ICCV 2009

Page 11: Context-based Visual Concept Detection Using Domain Adaptive Semantic Diffusion Yu-Gang Jiang Digital Video and Multimedia Lab, Columbia University VIREO

Method overview (cont.)

• Domain adaptive semantic diffusion (DASD)– Semantic graph

• Nodes are concepts• Edges represent

concept correlation

– Graph diffusion• Smooth concept

detection scores w.r.t the concept correlation

vehicle

road

Water sky

0.10.20.80.50.1…0.4

0.10.60.10.10.0…0.8

0.00.40.50.20.8…0.7

0.00.10.90.20.1…0.3

11

Page 12: Context-based Visual Concept Detection Using Domain Adaptive Semantic Diffusion Yu-Gang Jiang Digital Video and Multimedia Lab, Columbia University VIREO

Formulation

• Energy function

Detection score of concept ci on test samples Detection score of concept ci on test samples

Concept affinity matrixConcept affinity matrix

12

Page 13: Context-based Visual Concept Detection Using Domain Adaptive Semantic Diffusion Yu-Gang Jiang Digital Video and Multimedia Lab, Columbia University VIREO

Formulation (cont.)

• Gradually smooth the function makes the detection scores in accordance with the concept relationships

Detection score smoothing processDetection score smoothing process

13

Page 14: Context-based Visual Concept Detection Using Domain Adaptive Semantic Diffusion Yu-Gang Jiang Digital Video and Multimedia Lab, Columbia University VIREO

Formulation (cont.)

• Domain changes…

14

-- concept affinity matrix

Broadcast News Videos Documentary Videos

0.09

0.09

VEHICLE

0.640.20

0.24

0.29

DESERT

SKY

CLOUDS

WEAPON

0.17

CAR

PARKING_LOT

Page 15: Context-based Visual Concept Detection Using Domain Adaptive Semantic Diffusion Yu-Gang Jiang Digital Video and Multimedia Lab, Columbia University VIREO

Formulation (cont.)

• Graph adaptation

Graph adaptation processGraph adaptation process

15

Page 16: Context-based Visual Concept Detection Using Domain Adaptive Semantic Diffusion Yu-Gang Jiang Digital Video and Multimedia Lab, Columbia University VIREO

Iteration: 8Iteration: 12

0.16

0.09VEHICLE

0.640.20

0.24

0.29

DESERT

SKY

CLOUDS

WEAPON

0.10

PARKING_LOT

CAR0.16

0.09VEHICLE

0.640.19

0.18

0.32

DESERT

SKY

CLOUDS

WEAPON

0.13

PARKING_LOT

CAR

Iteration: 0Iteration: 4

0.15

0.09VEHICLE

0.640.17

0.12

0.34

DESERT

SKY

CLOUDS

WEAPON

0.16

PARKING_LOT

CAR0.15

0.08VEHICLE

0.640.17

0.05

0.38

DESERT

SKY

CLOUDS

WEAPON

0.19

PARKING_LOT

CAR

Iteration: 16

0.15

0.08VEHICLE

0.640.16

0.00

0.42

DESERT

SKY

CLOUDS

WEAPON

0.24

PARKING_LOT

CAR0.15

0.08VEHICLE

0.640.16

0.00

0.43

DESERT

SKY

CLOUDS

WEAPON

0.27

PARKING_LOT

CAR

Iteration: 20

Graph adaptation - example

Broadcast news video domainBroadcast news video domain Documentary video domainDocumentary video domain

16

Page 17: Context-based Visual Concept Detection Using Domain Adaptive Semantic Diffusion Yu-Gang Jiang Digital Video and Multimedia Lab, Columbia University VIREO

Experiments

• Datasets– TRECVID 2005, 2006, 2007

• Baseline detectors: VIREO-374

• Graph construction:– Ground-truth labels on TRECVID 2005

SPORTSSPORTS WEATHERWEATHER

OFFICEOFFICE BUILDINGBUILDING

DESERTDESERT MOUNTAINMOUNTAIN

WALKINGWALKING

PEOPLE-PEOPLE-MARCHINGMARCHING

EXPLOSION-EXPLOSION-FIREFIRE

MAPMAP

TRUCKTRUCK

CORP. LEADERCORP. LEADER

SPORTS WEATHER OFFICE

DESERTDESERT MOUNTAINMOUNTAIN WATERWATER

POLICEPOLICE MILITARYMILITARY ANIMALANIMAL TWO PEOPLETWO PEOPLE

NIGHT TIMENIGHT TIME TELEPHONETELEPHONE

STREETSTREET

CLASSROOMBUS

TRECVID 05/06 (Broadcast News Videos)TRECVID 05/06 (Broadcast News Videos) TRECVID 07 (Documentary Videos)TRECVID 07 (Documentary Videos)

17

Page 18: Context-based Visual Concept Detection Using Domain Adaptive Semantic Diffusion Yu-Gang Jiang Digital Video and Multimedia Lab, Columbia University VIREO

Results

• Performance gain on TRECVID 05-07 Datasets

TRECVID- 2005 2006 2007

# of evaluated concepts 39 20 20

Baseline (MAP) 0.166 0.154 0.099

SD 11.8% 15.6% 12.1%

DASD 11.9% 17.5% 16.2%

18

SD: semantic diffusion Consistent improvement over all 3 data sets

DASD: domain adaptive semantic diffusion Graph adaptation further improves the performance

Page 19: Context-based Visual Concept Detection Using Domain Adaptive Semantic Diffusion Yu-Gang Jiang Digital Video and Multimedia Lab, Columbia University VIREO

Results (cont.)

TRECVID 2006 Test Data

• Per-concept performance

19

Page 20: Context-based Visual Concept Detection Using Domain Adaptive Semantic Diffusion Yu-Gang Jiang Digital Video and Multimedia Lab, Columbia University VIREO

Results (cont.)

20

TRECVID Jiang et al Aytar et al Weng et al DASD

2005 2.2% 4.0% N/A 11.9%

2006 N/A N/A 16.7% 17.5%

Comparison with the state-of-the-artsComparison with the state-of-the-arts

various baseline detectors (TRECVID-06)various baseline detectors (TRECVID-06)

Page 21: Context-based Visual Concept Detection Using Domain Adaptive Semantic Diffusion Yu-Gang Jiang Digital Video and Multimedia Lab, Columbia University VIREO

Computational time

• Complexity is O(mn) –m: # concepts; n: # video shots

• Only 2 milliseconds per shot/keyframe!

21

TRECVID 05 TRECVID 06 TRECVID 07

SD 59s 84s 12s

DASD 89s 165s 28s

Single image/frame computation time

Page 22: Context-based Visual Concept Detection Using Domain Adaptive Semantic Diffusion Yu-Gang Jiang Digital Video and Multimedia Lab, Columbia University VIREO

Summary

22

• A judicious approach using local features achieves very impressive results for visual annotation

• Context information is helpful !– Domain adaptive semantic diffusion

• effective for enhancing video annotation accuracy• can alleviate the effect of data domain changes• highly efficient

– Future directions include:• detector reliability: diffusion over directed graph• web data annotation: utilize contextual information to improve

the quality of tags