Context-based Visual Concept Detection Using Domain Adaptive Semantic Diffusion Yu-Gang Jiang Digital Video and Multimedia Lab, Columbia University VIREO

Context-based Visual Concept Context-based Visual Concept Detection Using Domain Adaptive Detection Using Domain Adaptive

Semantic DiffusionSemantic Diffusion

Yu-Gang JiangDigital Video and Multimedia Lab, Columbia University

VIREO Research Group, City University of Hong Kong

1

Consider this question first

Can you find video clips containing fire?

Can you find video clips containing fire?

… …

…

time

Vid

eo 3

Vid

eo 2

Vid

eo 1

2

http://jaeger.earthsci.unimelb.edu.au/Images/Topographic/Whole_Earth/Earth_50.jpg

12,7

56.3

2 ki

lom

eter

s

3

• Texts (e.g., tags) are noisy even with human doing the annotation.• Need to look beyond texts!

The commercial search engines…

4

• There has been a great deal of interests in developing content-based approaches for video/image annotation.

– Everingham, Zisserman, Williams and Van Gool, Oxford Tech. Report, 2006.– Fei-Fei, Fergus and Perona, CVPR workshop 2004.– Grauman and Darrell, ICCV 2005.– Lazebnik, Schmid, and Ponce, CVPR 2006.– Jiang, Ngo, Yang, CIVR 2007.

Recent developments

sky, mountain, sky, mountain, tree, bridge, tree, bridge, river…river…

5

Recent developments

6

Keypoint extraction Visual word vocabulary

BoW histogram

Keypoint feature space

...

Clustering

............

v2 v3v1 v4

...

• Bag of visual words

J. Sivic et al, ICCV 2003

Our feature representation framework

7

Keypoint extraction

Visual word vocabulary 1

SIF

T fe

atur

e sp

ace

......... .........


.........


DoG Hessian Affine MSER

BoW histograms Using Soft-Weighting

... + + +

+ + ++ + ++ + +

+

- - -

- - -

- - -

- - - - - -

- - - - - - - - -

SVM classifiers

... ......

+ + +

+ + ++ + ++ + +

+

- - -

- - -

- - -

- - - - - -

- - - - - - - - -

+ + +

+ + ++ + +

- - -

- - -

- - -

- - - - - -

- - - - - - - - -

Vocabulary Generation BoW Representation

Chang TRECVID 2008; Jiang, Yang, Ngo & Hauptmann, IEEE TMM, to appear

Recent results on TRECVID

8

Limitations

• Most existing methods aim at the assignment of concept labels individually– but concepts do not occur in isolation!

• Domain change between training and testing data was not considered

military personnel

smoke

explosion_fire

road outdoor

vehicle

building

9

Documentary Videos

Broadcast News Videos

Method overview

• Domain adaptive semantic diffusion (DASD)

10

road vehicle

0.05

0.19

0.80

0.46

0.13

0.01

0.12

0.91

0.18

0.05

Water

0.11

0.58

0.10

0.13

0.02

Jiang, Wang, Chang, Ngo, ICCV 2009

Method overview (cont.)

• Domain adaptive semantic diffusion (DASD)– Semantic graph

• Nodes are concepts• Edges represent

concept correlation

– Graph diffusion• Smooth concept

detection scores w.r.t the concept correlation

vehicle

road

Water sky

0.10.20.80.50.1…0.4

0.10.60.10.10.0…0.8

0.00.40.50.20.8…0.7

0.00.10.90.20.1…0.3

11

Formulation

• Energy function

Detection score of concept ci on test samples Detection score of concept ci on test samples

Concept affinity matrixConcept affinity matrix

12

Formulation (cont.)

• Gradually smooth the function makes the detection scores in accordance with the concept relationships

Detection score smoothing processDetection score smoothing process

13

Formulation (cont.)

• Domain changes…

14

-- concept affinity matrix

Broadcast News Videos Documentary Videos

0.09

0.09

VEHICLE

0.640.20

0.24

0.29

DESERT

SKY

CLOUDS

WEAPON

0.17

CAR

PARKING_LOT

Formulation (cont.)

• Graph adaptation

Graph adaptation processGraph adaptation process

15

Iteration: 8Iteration: 12

0.16

0.09VEHICLE

0.640.20

0.24

0.29

DESERT

SKY

CLOUDS

WEAPON

0.10

PARKING_LOT

CAR0.16

0.09VEHICLE

0.640.19

0.18

0.32

DESERT

SKY

CLOUDS

WEAPON

0.13

PARKING_LOT

CAR

Iteration: 0Iteration: 4

0.15

0.09VEHICLE

0.640.17

0.12

0.34

DESERT

SKY

CLOUDS

WEAPON

0.16

PARKING_LOT

CAR0.15

0.08VEHICLE

0.640.17

0.05

0.38

DESERT

SKY

CLOUDS

WEAPON

0.19

PARKING_LOT

CAR

Iteration: 16

0.15

0.08VEHICLE

0.640.16

0.00

0.42

DESERT

SKY

CLOUDS

WEAPON

0.24

PARKING_LOT

CAR0.15

0.08VEHICLE

0.640.16

0.00

0.43

DESERT

SKY

CLOUDS

WEAPON

0.27

PARKING_LOT

CAR

Iteration: 20

Graph adaptation - example

Broadcast news video domainBroadcast news video domain Documentary video domainDocumentary video domain

16

Experiments

• Datasets– TRECVID 2005, 2006, 2007

• Baseline detectors: VIREO-374

• Graph construction:– Ground-truth labels on TRECVID 2005

SPORTSSPORTS WEATHERWEATHER

OFFICEOFFICE BUILDINGBUILDING

DESERTDESERT MOUNTAINMOUNTAIN

WALKINGWALKING

PEOPLE-PEOPLE-MARCHINGMARCHING

EXPLOSION-EXPLOSION-FIREFIRE

MAPMAP

TRUCKTRUCK

CORP. LEADERCORP. LEADER

SPORTS WEATHER OFFICE

DESERTDESERT MOUNTAINMOUNTAIN WATERWATER

POLICEPOLICE MILITARYMILITARY ANIMALANIMAL TWO PEOPLETWO PEOPLE

NIGHT TIMENIGHT TIME TELEPHONETELEPHONE

STREETSTREET

CLASSROOMBUS

TRECVID 05/06 (Broadcast News Videos)TRECVID 05/06 (Broadcast News Videos) TRECVID 07 (Documentary Videos)TRECVID 07 (Documentary Videos)

17

Results

• Performance gain on TRECVID 05-07 Datasets

TRECVID- 2005 2006 2007

# of evaluated concepts 39 20 20

Baseline (MAP) 0.166 0.154 0.099

SD 11.8% 15.6% 12.1%

DASD 11.9% 17.5% 16.2%

18

SD: semantic diffusion Consistent improvement over all 3 data sets

DASD: domain adaptive semantic diffusion Graph adaptation further improves the performance

Results (cont.)

TRECVID 2006 Test Data

• Per-concept performance

19

Results (cont.)

20

TRECVID Jiang et al Aytar et al Weng et al DASD

2005 2.2% 4.0% N/A 11.9%

2006 N/A N/A 16.7% 17.5%

Comparison with the state-of-the-artsComparison with the state-of-the-arts

various baseline detectors (TRECVID-06)various baseline detectors (TRECVID-06)

Computational time

• Complexity is O(mn) –m: # concepts; n: # video shots

• Only 2 milliseconds per shot/keyframe!

21

TRECVID 05 TRECVID 06 TRECVID 07

SD 59s 84s 12s

DASD 89s 165s 28s

Single image/frame computation time

Summary

22

• A judicious approach using local features achieves very impressive results for visual annotation

• Context information is helpful !– Domain adaptive semantic diffusion

• effective for enhancing video annotation accuracy• can alleviate the effect of data domain changes• highly efficient

– Future directions include:• detector reliability: diffusion over directed graph• web data annotation: utilize contextual information to improve

the quality of tags

Documents

Context-based Visual Concept Detection Using Domain Adaptive Semantic Diffusion Yu-Gang Jiang Digital Video and Multimedia Lab, Columbia University VIREO