Challenges in Mining Large Image Datasets

Challenges in Mining Large Image Datasets

Jelena Tešić, B.S. ManjunathUniversity of California, Santa Barbara

http://vision.ece.ucsb.edu

Vision Research Lab

IntroductionData and event representation Meaningful data summarization Modeling of high-level human concepts

Learning events Feature space and perceptual relations

Mining image datasets Feature set size and dimension Size and nature of image dataset

Aerial Images of SB county 54 images - 5428x5428 pixels 177,174 tiles - 128x128 pixels

Vision Research Lab

Visual ThesaurusPerceptual Classification

1. T=1; SOM dim. red. of input training feature space2. Assign labels to SOM output3. LVQ finer tuning of class boundaries4. It T< number_of_iterations {

T=T+1; go back to step 2. } else END.

Perceptual and feature space brought together: same class (16) and class 17

Thesaurus Entries Generalized Lloyd Algorithm330 codewords

Vision Research Lab

Spatial Event CubesImage tile raster space

Thesaurus entries

Spatial binary relation ρ

SEC face values

Multimode SEC

distance

direction

x

y

p

q

u

v

Cρ(u,v)

COLOR

TEXTURE

SEC

( , ) | [1, ], [1, ]R x y x M y N

| is thesaurus entry/codeword , :T t t R T

, , , , , ( ) , ( )R R p q R u v T p u q v

( , ) ( , ) | ( ) ( )C u v p q p q T p u T q v

Vision Research Lab

Visual Data MiningSEC

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1566 0 0 0 0 0 0 0 8 0 1 0 0 0 0 0 0 0 0 9 0 0 0 0 0 01 0 1874 0 0 1 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 121 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 496 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 6 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 397 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 114 0 0 0 2 0 0 0 0 0 0 0 3825 2 0 0 0 0 0 0 8 0 0 0 0 5 3 50 0 0 0 720 0 0 2 0 0 0 1 4215 0 0 0 2 0 0 0 0 1 0 0 5 8 653 0 0 0

434

Cluster Analysis

0

5

10

15

0

5

10

150

200

400

600

800

1000

Vision Research Lab

Spatial Data Mining

Generalized Apriori1. Find all sets of tuples that

have minimum support

2. Use the frequent itemsets to generate the desired rules

Low-level mining

Occurrence of the ocean in the image dataset

2D3D

1 { | min}i iF u u

2 ( , ) | 1i j ijF u u ( 3; ; ) do {Kfor K F K

1 1 1 1( ,..., ) | ( ,..., )

}K KK i i j j KF u u u u F

1 21 ... Ki i i N

| ( ( , ) )ij ij i jA C u u S S

1 1 1 1( ,..., ) ( ,..., )K K Kj j P i i

1 2{ | ... }K KP

Vision Research Lab

Higher level Mining

Ocean analysis

653

890

434

Vision Research Lab

ConclusionVisual mining framework Spatial event representation Image analysis at a conceptual level Perceptual knowledge discovery

Demos: http://vision.ece.ucsb.edu/texture/mpeg7/ http://nayana.ece.ucsb.edu/registration/

Amazon forest DV40 hours – 5tbytesMosaics from 2 h

Vision Research Lab

Adaptive NN Search for Relevance Feedback

Relevance Feedback learn user’s subjective similarity measures

Scalable solution Explore the correlation of consecutive NN search VA-file indexing

Feature space QueryDistance Measure - K nearest neighbors at iteration t - distance between Q and the K-th farthest object upper bound

- K-th largest upper bound of all approximations

1 2[ , , , ]i i i iMF f f f

( , , ) ( ) ( )Ti t i t id Q F W Q F W Q F

1 2[ , , , ]i i iMQ q q q

tR( )tr Q

1 1max{ ( , , ), 1, , }u tt i tr d Q F W i K

1 1( ) ( )ut tr Q r Q ( , )tQ W

( , ) ( , , ) ( , )i t i t i tL Q W d Q F W U Q W

( )iP F

Vision Research Lab

Adaptive NN Search for Relevance Feedback

If is a qualified one in its lower bound must satisfyWhen , it is guaranteed that more candidates can be excluded as compared with traditional search

( )iP F 1 ( , )opttN Q W

1( , ) ( )ui t tL Q W r Q

1 1( ) ( , )ut tr Q Q W

2ur

2( , )Q W2r2ur

Vision Research Lab

Performance Evaluation - 685,900 images

vs.Their difference is larger at a coarser resolution

vs. At coarser resolution, the estimate is better

( )utr Q ( , )tQ W ( )t Q ( , )tQ W

Vision Research Lab

Performance EvaluationAdaptive NN search

Utilizing the correlation to confine the search space The constraints can be computed efficiently Significant savings on disk accesses

1 1( ) / ( )traditional N proposed N 1 1( ) . ( )traditional N vs proposed N

Documents

Challenges in Mining Large Image Datasets