Upload
gustav
View
51
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Learning the Relative Importance of Features in Image Data. Aparna Varde, Elke Rundensteiner, Giti Javidi, Ehsan Sheybani and Jianyu Liang IEEE ICDE’s DBRank Istanbul, Turkey April 2007. Introduction. Scientific domains Images from phenomena Image Features Visual Features - PowerPoint PPT Presentation
Citation preview
11
Learning the Relative Learning the Relative Importance of Features in Importance of Features in
Image DataImage DataAparna Varde, Elke Rundensteiner, Giti Aparna Varde, Elke Rundensteiner, Giti Javidi, Ehsan Sheybani and Jianyu LiangJavidi, Ehsan Sheybani and Jianyu Liang
IEEE ICDE’s DBRankIEEE ICDE’s DBRankIstanbul, TurkeyIstanbul, Turkey
April 2007April 2007
22
IntroductionIntroduction
Scientific domainsScientific domains Images from phenomenaImages from phenomena
Image FeaturesImage Features Visual FeaturesVisual Features Metadata FeaturesMetadata Features
Comparison of ImagesComparison of Images Based on featuresBased on features
Silicon Nanopore
Herb Leaf
33
MotivationMotivation Consider a similarity Consider a similarity
search processsearch process
Some features more Some features more important than othersimportant than others
Experts have Experts have subjective notions of subjective notions of comparisoncomparison
Need to learn feature-Need to learn feature-based distance function based distance function
Target Image
Source Images
44
GoalsGoalsGivenGiven
Training data on images and their Training data on images and their applicable featuresapplicable features
Learn Learn Distance function for image Distance function for image
comparisoncomparisonFunction should preserve relative Function should preserve relative
importance of features in the domainimportance of features in the domain
55
Proposed Approach: Proposed Approach: FeaturesRankFeaturesRank
InputInput Training samples: pairs of imagesTraining samples: pairs of images Level of similarity for each pairLevel of similarity for each pair
Distance function: weighted sum of featuresDistance function: weighted sum of features Process: Iterative approachProcess: Iterative approach
Cluster images in levels using distance functionCluster images in levels using distance function Error: difference between similarity levels in clusters and Error: difference between similarity levels in clusters and
samplessamples Adjust distance function based on errorAdjust distance function based on error
OutputOutput Distance function giving minimal errorDistance function giving minimal error
66
Process of LearningProcess of Learning Use a clustering Use a clustering
algorithmalgorithm Notion of distanceNotion of distance
ΔΔ = ∑ = ∑f=1 to F f=1 to F ααf f ΔΔf f
Features given as inputs Features given as inputs Guess initial weightsGuess initial weights
Cluster images in Cluster images in LL levelslevels LL = number of levels in = number of levels in
samplessamples Clusters
77
Process of LearningProcess of Learning
Error pair: level of similarity in clusters not equal to level of similarity in Error pair: level of similarity in clusters not equal to level of similarity in samplessamples
Error: ratio of number of error pairs over total number of pairsError: ratio of number of error pairs over total number of pairs Error threshold: fraction of total number of pairs allowed to be error Error threshold: fraction of total number of pairs allowed to be error
pairs pairs
P1: (I1,I16), LT(P1) = 2P2: (I5,I14), LT(P2) = 1P3: (I2,I3), LT(P3) = 0P4: (I6,I18), LT(P4) = 1P5: (I7,I9), LT(P5) = 0P6: (I12,I19), LT(P6) = 2P7: (I17,I20), LT(P7) = 1P8: (I4,I11), LT(P8) = 3P9: (I8,I10), LT(P9) = 2P10: (I13,I15), LT(P10) = 3
Clusters Training Samples
88
Process of LearningProcess of Learning
If level of similarity of pair in clusters greater than in samplesIf level of similarity of pair in clusters greater than in samples Images considered closer to each other in clusters than they should Images considered closer to each other in clusters than they should
be be To push them apart, increase weights of some features in distance To push them apart, increase weights of some features in distance
function function
P1: (I1,I16), LT(P1) = 2P2: (I5,I14), LT(P2) = 1P3: (I2,I3), LT(P3) = 0P4: (I6,I18), LT(P4) = 1P5: (I7,I9), LT(P5) = 0P6: (I12,I19), LT(P6) = 2P7: (I17,I20), LT(P7) = 1P8: (I4,I11), LT(P8) = 3P9: (I8,I10), LT(P9) = 2P10: (I13,I15), LT(P10) = 3
Clusters Training Samples
99
Process of LearningProcess of Learning Step: Difference between similarity levelsStep: Difference between similarity levels
|Level of similarity in training samples – Level of similarity in |Level of similarity in training samples – Level of similarity in clusters|clusters|
Step = | LT (Ia, Ib) – LC (Ia,Ib) |Step = | LT (Ia, Ib) – LC (Ia,Ib) |
Blame: Responsibility of a feature for errorBlame: Responsibility of a feature for error Distance due to feature f / Total distance between imagesDistance due to feature f / Total distance between images Blame = ΔBlame = Δf f (Ia, Ib) / Δ (Ia, Ib)(Ia, Ib) / Δ (Ia, Ib)
Feature Weight HeuristicFeature Weight Heuristic To increase weightsTo increase weights
New weight of feature f = Old weight + Step*BlameNew weight of feature f = Old weight + Step*Blame Conversely, to decrease weights Conversely, to decrease weights
New weight = Old weight – Step*BlameNew weight = Old weight – Step*Blame
1010
Process of LearningProcess of Learning Consider effect of each error pair and Consider effect of each error pair and
adjust weights adjust weights Use adjusted distance function for another Use adjusted distance function for another
iteration of clusteringiteration of clustering Repeat until error below threshold or Repeat until error below threshold or
maximum number of iterations reached maximum number of iterations reached Output the distance function giving lowest Output the distance function giving lowest
errorerror
1111
Experimental EvaluationExperimental Evaluation Real Images from Nanotechnology and Real Images from Nanotechnology and
Bioinformatics used for evaluationBioinformatics used for evaluation Parameters: error threshold 0.1 to 0.05, maximum Parameters: error threshold 0.1 to 0.05, maximum
number of iterations = 1000, clustering seeds alterednumber of iterations = 1000, clustering seeds altered
Training DataTraining Data Nanotechnology: 60 images, 3 levels of similarityNanotechnology: 60 images, 3 levels of similarity Bioinformatics: 40 images, 2 levels of similarityBioinformatics: 40 images, 2 levels of similarity
User Study with Test DataUser Study with Test Data Similarity search performed using learned distance Similarity search performed using learned distance
functionfunction Experts evaluate effectiveness of resultsExperts evaluate effectiveness of results
1212
Learning Behavior: Learning Behavior: NanotechnologyNanotechnology
Convergence to error Convergence to error below threshold in below threshold in less than 300 less than 300 iterationsiterations
Experiments with 5% Experiments with 5% threshold take longer threshold take longer to converge than 10%to converge than 10%
Not much difference Not much difference in behavior with in behavior with random and equal random and equal initial weightsinitial weights
RandomInitial
Weights
EqualInitial
Weights
1313
Learning Behavior: Learning Behavior: BioinformaticsBioinformatics
Error in bioinformatics Error in bioinformatics data fluctuates more data fluctuates more than in nanotechnology than in nanotechnology datadata
Possible reasonsPossible reasons Fewer images were used Fewer images were used
as training samples as training samples Fewer levels of similarity Fewer levels of similarity
were usedwere used
Other observations Other observations similar to similar to nanotechnology datananotechnology data
RandomInitial
Weights
Equal Initial
Weights
1414
Similarity SearchSimilarity Search
Using learned distance function, target image compared Using learned distance function, target image compared with source images in distinct test setwith source images in distinct test set
Top 4 matches ranked in order of similarity Top 4 matches ranked in order of similarity Experts verify that ranking is accurateExperts verify that ranking is accurate
Target Image Top 4 Matches among Source Images
Nanotechnology
Bioinformatics
1515
ConclusionsConclusions Contributions of this workContributions of this work
FeaturesRank approach proposed to learn FeaturesRank approach proposed to learn distance function for relative importance of distance function for relative importance of features in images features in images
Learned distance function assessed by ranking Learned distance function assessed by ranking images for similarity search with real data from images for similarity search with real data from nanotechnology and bioinformaticsnanotechnology and bioinformatics
Ongoing workOngoing work Defining objective measures for accuracyDefining objective measures for accuracy Performing comparative studies with state-of-Performing comparative studies with state-of-
the-artthe-art