Learning the Relative Importance of Features in Image Data

11

Learning the Relative Learning the Relative Importance of Features in Importance of Features in

Image DataImage DataAparna Varde, Elke Rundensteiner, Giti Aparna Varde, Elke Rundensteiner, Giti Javidi, Ehsan Sheybani and Jianyu LiangJavidi, Ehsan Sheybani and Jianyu Liang

IEEE ICDE’s DBRankIEEE ICDE’s DBRankIstanbul, TurkeyIstanbul, Turkey

April 2007April 2007

22

IntroductionIntroduction

Scientific domainsScientific domains Images from phenomenaImages from phenomena

Image FeaturesImage Features Visual FeaturesVisual Features Metadata FeaturesMetadata Features

Comparison of ImagesComparison of Images Based on featuresBased on features

Silicon Nanopore

Herb Leaf

33

MotivationMotivation Consider a similarity Consider a similarity

search processsearch process

Some features more Some features more important than othersimportant than others

Experts have Experts have subjective notions of subjective notions of comparisoncomparison

Need to learn feature-Need to learn feature-based distance function based distance function

Target Image

Source Images

44

GoalsGoalsGivenGiven

Training data on images and their Training data on images and their applicable featuresapplicable features

Learn Learn Distance function for image Distance function for image

comparisoncomparisonFunction should preserve relative Function should preserve relative

importance of features in the domainimportance of features in the domain

55

Proposed Approach: Proposed Approach: FeaturesRankFeaturesRank

InputInput Training samples: pairs of imagesTraining samples: pairs of images Level of similarity for each pairLevel of similarity for each pair

Distance function: weighted sum of featuresDistance function: weighted sum of features Process: Iterative approachProcess: Iterative approach

Cluster images in levels using distance functionCluster images in levels using distance function Error: difference between similarity levels in clusters and Error: difference between similarity levels in clusters and

samplessamples Adjust distance function based on errorAdjust distance function based on error

OutputOutput Distance function giving minimal errorDistance function giving minimal error

66

Process of LearningProcess of Learning Use a clustering Use a clustering

algorithmalgorithm Notion of distanceNotion of distance

ΔΔ = ∑ = ∑f=1 to F f=1 to F ααf f ΔΔf f

Features given as inputs Features given as inputs Guess initial weightsGuess initial weights

Cluster images in Cluster images in LL levelslevels LL = number of levels in = number of levels in

samplessamples Clusters

77

Process of LearningProcess of Learning

Error pair: level of similarity in clusters not equal to level of similarity in Error pair: level of similarity in clusters not equal to level of similarity in samplessamples

Error: ratio of number of error pairs over total number of pairsError: ratio of number of error pairs over total number of pairs Error threshold: fraction of total number of pairs allowed to be error Error threshold: fraction of total number of pairs allowed to be error

pairs pairs

P1: (I1,I16), LT(P1) = 2P2: (I5,I14), LT(P2) = 1P3: (I2,I3), LT(P3) = 0P4: (I6,I18), LT(P4) = 1P5: (I7,I9), LT(P5) = 0P6: (I12,I19), LT(P6) = 2P7: (I17,I20), LT(P7) = 1P8: (I4,I11), LT(P8) = 3P9: (I8,I10), LT(P9) = 2P10: (I13,I15), LT(P10) = 3

Clusters Training Samples

88

Process of LearningProcess of Learning

If level of similarity of pair in clusters greater than in samplesIf level of similarity of pair in clusters greater than in samples Images considered closer to each other in clusters than they should Images considered closer to each other in clusters than they should

be be To push them apart, increase weights of some features in distance To push them apart, increase weights of some features in distance

function function

P1: (I1,I16), LT(P1) = 2P2: (I5,I14), LT(P2) = 1P3: (I2,I3), LT(P3) = 0P4: (I6,I18), LT(P4) = 1P5: (I7,I9), LT(P5) = 0P6: (I12,I19), LT(P6) = 2P7: (I17,I20), LT(P7) = 1P8: (I4,I11), LT(P8) = 3P9: (I8,I10), LT(P9) = 2P10: (I13,I15), LT(P10) = 3

Clusters Training Samples

99

Process of LearningProcess of Learning Step: Difference between similarity levelsStep: Difference between similarity levels

|Level of similarity in training samples – Level of similarity in |Level of similarity in training samples – Level of similarity in clusters|clusters|

Step = | LT (Ia, Ib) – LC (Ia,Ib) |Step = | LT (Ia, Ib) – LC (Ia,Ib) |

Blame: Responsibility of a feature for errorBlame: Responsibility of a feature for error Distance due to feature f / Total distance between imagesDistance due to feature f / Total distance between images Blame = ΔBlame = Δf f (Ia, Ib) / Δ (Ia, Ib)(Ia, Ib) / Δ (Ia, Ib)

Feature Weight HeuristicFeature Weight Heuristic To increase weightsTo increase weights

New weight of feature f = Old weight + Step*BlameNew weight of feature f = Old weight + Step*Blame Conversely, to decrease weights Conversely, to decrease weights

New weight = Old weight – Step*BlameNew weight = Old weight – Step*Blame

1010

Process of LearningProcess of Learning Consider effect of each error pair and Consider effect of each error pair and

adjust weights adjust weights Use adjusted distance function for another Use adjusted distance function for another

iteration of clusteringiteration of clustering Repeat until error below threshold or Repeat until error below threshold or

maximum number of iterations reached maximum number of iterations reached Output the distance function giving lowest Output the distance function giving lowest

errorerror

1111

Experimental EvaluationExperimental Evaluation Real Images from Nanotechnology and Real Images from Nanotechnology and

Bioinformatics used for evaluationBioinformatics used for evaluation Parameters: error threshold 0.1 to 0.05, maximum Parameters: error threshold 0.1 to 0.05, maximum

number of iterations = 1000, clustering seeds alterednumber of iterations = 1000, clustering seeds altered

Training DataTraining Data Nanotechnology: 60 images, 3 levels of similarityNanotechnology: 60 images, 3 levels of similarity Bioinformatics: 40 images, 2 levels of similarityBioinformatics: 40 images, 2 levels of similarity

User Study with Test DataUser Study with Test Data Similarity search performed using learned distance Similarity search performed using learned distance

functionfunction Experts evaluate effectiveness of resultsExperts evaluate effectiveness of results

1212

Learning Behavior: Learning Behavior: NanotechnologyNanotechnology

Convergence to error Convergence to error below threshold in below threshold in less than 300 less than 300 iterationsiterations

Experiments with 5% Experiments with 5% threshold take longer threshold take longer to converge than 10%to converge than 10%

Not much difference Not much difference in behavior with in behavior with random and equal random and equal initial weightsinitial weights

RandomInitial

Weights

EqualInitial

Weights

1313

Learning Behavior: Learning Behavior: BioinformaticsBioinformatics

Error in bioinformatics Error in bioinformatics data fluctuates more data fluctuates more than in nanotechnology than in nanotechnology datadata

Possible reasonsPossible reasons Fewer images were used Fewer images were used

as training samples as training samples Fewer levels of similarity Fewer levels of similarity

were usedwere used

Other observations Other observations similar to similar to nanotechnology datananotechnology data

RandomInitial

Weights

Equal Initial

Weights

1414

Similarity SearchSimilarity Search

Using learned distance function, target image compared Using learned distance function, target image compared with source images in distinct test setwith source images in distinct test set

Top 4 matches ranked in order of similarity Top 4 matches ranked in order of similarity Experts verify that ranking is accurateExperts verify that ranking is accurate

Target Image Top 4 Matches among Source Images

Nanotechnology

Bioinformatics

1515

ConclusionsConclusions Contributions of this workContributions of this work

FeaturesRank approach proposed to learn FeaturesRank approach proposed to learn distance function for relative importance of distance function for relative importance of features in images features in images

Learned distance function assessed by ranking Learned distance function assessed by ranking images for similarity search with real data from images for similarity search with real data from nanotechnology and bioinformaticsnanotechnology and bioinformatics

Ongoing workOngoing work Defining objective measures for accuracyDefining objective measures for accuracy Performing comparative studies with state-of-Performing comparative studies with state-of-

the-artthe-art

Documents

Learning the Relative Importance of Features in Image Data