Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Rogerio Feris, March 6, 2014 EECS 6890 – Topics in Information Processing
Spring 2014, Columbia University http://rogerioferis.com/VisualRecognitionAndSearch2014
Class 6: Attributes and Semantic Features
Visual Recognition And Search Columbia University, Spring 2014
Paper Review Reminder
Paper review due March 11 (solo, no groups):
Perronnin et al, Improving the Fisher Kernel for Large-Scale Image Classification, ECCV 2010
You can use up to 3 late days over the course of the semester
Required content (1-2 pages):
Summary Strengths and Weaknesses Experimental Analysis Proposed Extensions
Check more details at:
http://rogerioferis.com/VisualRecognitionAndSearch2014/PaperReviews.html
Visual Recognition And Search Columbia University, Spring 2014
Project Update Reminder
Project Update Presentation: March 25/27
Milestones, preliminary results.
More information about the project update requirements coming soon.
Visual Recognition And Search Columbia University, Spring 2014
What we have seen so far
Low-Level Features
SIFT, SURF, HOG, BRISK, etc.
Feature Coding and Pooling
Bag-of-words, Sparse coding, Fisher vector coding, etc.
Encoding Structure: Part-based Models
Deformable Part-based Models, Poselets, etc.
Attributes And Semantic Features [Today]
Part I: From Low-level to Semantic Visual Representations
Visual Recognition And Search Columbia University, Spring 2014
Introduction to Semantic Features
Use the scores of semantic classifiers as high-level features
…
Semantic Features
Off-the-shelf Classifiers
Compact / powerful descriptor with semantic meaning (allows “explaining” the decision)
Score Score Score
Water Classifier Sand Classifier Sky Classifier
Input Image
Beach Classifier
Visual Recognition And Search Columbia University, Spring 2014
Semantic Features (Frame-Level) Illustration of Early IBM work (multimedia community) describing
this concept
[John Smith et al, Multimedia Semantic Indexing Using Model Vectors, ICME 2003]
Concatenation / Dimensionality Reduction
Visual Recognition And Search Columbia University, Spring 2014
Semantic Features (Frame-level)
System evolved to the IBM Multimedia Analysis and Retrieval System (IMARS)
Ensemble Learning
Rapid event modeling, e.g., “accident with high-speed skidding”
Discriminative semantic basis [Rong Yan et al, Model-Shared Subspace Boosting for Multi-label Classification, KDD 2007]
Visual Recognition And Search Columbia University, Spring 2014
Classemes (Frame-level)
[L. Torresani et al, Efficient Object Category Recognition Using Classemes, ECCV 2010]
Noisy Labels
Images used to train the “table” classeme (from Google image search)
Descriptor is formed by concatenating the outputs of weakly trained classifiers called classemes (trained with noisy labels)
Visual Recognition And Search Columbia University, Spring 2014
Classemes (Frame-level)
Compact and Efficient Descriptor , useful for large-scale classification
Features are not really semantic!
Visual Recognition And Search Columbia University, Spring 2014
Semantic Features (Object Level)
Object Bank
http://vision.stanford.edu/projects/objectbank/
[Li-Jia Li et al, Object Bank: A High-Level Image Representation for Scene Classification and Semantic Feature Sparsification]
Source code available (~7 seconds per image)
Visual Recognition And Search Columbia University, Spring 2014
Shifting from Naming to Describing:
Representations based on Semantic Attributes
Visual Recognition And Search Columbia University, Spring 2014
Semantic Attributes
Modifiers rather than (or in addition to) nouns
Semantic properties that are shared among objects
Attributes are category independent and transferrable
Bald
Beard
Red Shirt ?
Naming Describing
Visual Recognition And Search Columbia University, Spring 2014
Examples of Semantic Attributes
http://whatbird.com
Visual Recognition And Search Columbia University, Spring 2014
Examples of Semantic Attributes
[Lampert et al, Learning To Detect Unseen Object Classes by Between-Class Attribute Transfer, CVPR 2009]
Visual Recognition And Search Columbia University, Spring 2014
Examples of Semantic Attributes
[Farhadi et al, Describing Objects by their Attributes, 2009]
Visual Recognition And Search Columbia University, Spring 2014
Examples of Semantic Attributes
[Berg et al, Automatic Attribute Discovery and Characterization, ECCV 2010]
Visual Recognition And Search Columbia University, Spring 2014
Examples of Semantic Attributes
[Chen et al, Describing Clothing by Semantic Attributes, ECCV 2012]
Visual Recognition And Search Columbia University, Spring 2014
Examples of Semantic Attributes
http://www.galaxyzoo.org/
Visual Recognition And Search Columbia University, Spring 2014
Attribute Models
Slide credit: Devi Parikh
[Kumar et al., Describable Visual Attributes for Face Verification and Image Search, PAMI 2011]
(Or confidence)
Binary Attributes
Visual Recognition And Search Columbia University, Spring 2014
Attribute Models
Slide credit: Devi Parikh
> natural
< smiling
Parikh and Grauman, Relative Attributes, ICCV 2011
Max-margin learning to rank formulation of Joachims 2002
Relative Attributes
Visual Recognition And Search Columbia University, Spring 2014
Attribute-Based Classification
Scalable Learning
Visual Recognition And Search Columbia University, Spring 2014
Attribute-based Classification
[Lampert et al, Learning To Detect Unseen Object Classes by Between-Class Attribute Transfer, CVPR 2009]
Recognition of Unseen Classes (Zero-Shot Learning)
1) Train semantic attribute classifiers
2) Obtain a classifier for an unseen object (no training samples) by just specifying which attributes it has
Visual Recognition And Search Columbia University, Spring 2014
Zero-Shot Learning
Unseen categories
Unseen categories
Semantic Attribute Classifiers
Flat multi-class classification
Attribute-based classification
Visual Recognition And Search Columbia University, Spring 2014
Class-Attribute Associations
Manual Specification of Class-Attribute Associations
Visual Recognition And Search Columbia University, Spring 2014
Class-Attribute Associations
Associations may be extracted automatically from other sources
Rohrbach et al . "What Helps Where – And Why? Semantic Relatedness for Knowledge Transfer", CVPR 2010
Visual Recognition And Search Columbia University, Spring 2014
Label Embedding
Label Embedding Framework
Manual Specification of Attributes
Akata et al . “Label Embedding for Attribute-based Classification", CVPR 2013
Visual Recognition And Search Columbia University, Spring 2014
Label Embedding
Frome et al . "DeViSE: A Deep Visual-Semantic Embedding Model", NIPS 2013
Label Embedding Framework
Automatic Discovery of word associations
Visual Recognition And Search Columbia University, Spring 2014
Label Embedding
Language Model Source Code: https://code.google.com/p/word2vec/
Zero-Shot Learning / Semantically close mistakes
Label Embedding Framework
Automatic Discovery of word associations
Visual Recognition And Search Columbia University, Spring 2014
Attributes as mid-level features
Face verification [Kumar et al, ICCV 2009]
Action recognition [Liu al, CVPR2011]
Semantic attributes + discriminative (non-semantic) features
Visual Recognition And Search Columbia University, Spring 2014
Attributes as mid-level features
Person Re-identification [Layne et al, BMVC 2012]
Bird Categorization [Farrell et al, ICCV 2011]
Visual Recognition And Search Columbia University, Spring 2014
Attributes as mid-level features Dhar et al, High Level Describable Attributes for Predicting
Aesthetics and Interestingness, CVPR 2011
Visual Recognition And Search Columbia University, Spring 2014
Attributes as mid-level features
Slide credit: Tamara Berg
…
Detecting Interesting Insects
Visual Recognition And Search Columbia University, Spring 2014
Attributes as mid-level features
Slide credit: Tamara Berg
…
Detecting Interesting Beaches
Visual Recognition And Search Columbia University, Spring 2014
Attributes as mid-level features
Note: Several recent methods use the term “attributes” to refer to non-semantic model outputs In this case attributes are just mid-level features, like PCA, hidden layers in neural nets, … (non-interpretable splits)
Visual Recognition And Search Columbia University, Spring 2014
Attributes for Fine-Grained Categorization
Visual Recognition And Search Columbia University, Spring 2014
Fine-Grained Categorization Visipedia (http://http://visipedia.org/)
Machines collaborating with humans to organize visual knowledge, connecting text to
images, images to text, and images to images
Easy annotation interface for experts (powered by computer vision)
Picture credit: Serge Belongie
Visual Query: Fine-grained Bird Categorization
Visual Recognition And Search Columbia University, Spring 2014
Fine-Grained Categorization
Slide Credit: Christoph Lampert
African Indian Is it an African or Indian Elephant?
Example-based Fine-Grained Categorization is Hard!!
Visual Recognition And Search Columbia University, Spring 2014
Fine-Grained Categorization
African Indian Is it an African or Indian Elephant?
Visual distinction of subordinate categories may be quite subtle, usually based on Parts and Attributes
Larger Ears Smaller Ears
Visual Recognition And Search Columbia University, Spring 2014
Fine-Grained Categorization
Codebook
Standard classification methods may not be suitable because the variation between classes is small …
[B. Yao, CVPR 2012]
… and intra-class variation is still high.
Visual Recognition And Search Columbia University, Spring 2014
Fine-Grained Categorization
Humans rely on field guides!
Field guides usually refer to parts and attributes of the object
Slide Credit: Pietro Perona
Visual Recognition And Search Columbia University, Spring 2014
Fine-Grained Categorization [Branson et al, Visual Recognition with Humans in the Loop, ECCV 2010]
Visual Recognition And Search Columbia University, Spring 2014
Fine-Grained Categorization
[Branson et al, Visual Recognition with Humans in the Loop, ECCV 2010]
Computer vision reduces the amount of human-interaction (minimizes the number of questions)
Visual Recognition And Search Columbia University, Spring 2014
Fine-Grained Categorization
[Wah et al, Multiclass Recognition and Part Localization with Humans in the Loop, ICCV 2011]
Localized part and attribute detectors.
Questions include asking the user to localize parts.
Visual Recognition And Search Columbia University, Spring 2014
Fine-Grained Categorization
Video Demo
Visual Recognition And Search Columbia University, Spring 2014
Fine-Grained Categorization
http://www.vision.caltech.edu/visipedia/CUB-200-2011.html
Visual Recognition And Search Columbia University, Spring 2014
Fine-Grained Categorization
Check the fine-grained visual categorization workshop: http://www.fgvc.org/
Visual Recognition And Search Columbia University, Spring 2014
Fine-Grained Categorization
Is fine-grained recognition different? Check https://sites.google.com/site/fgcomp2013/
Visual Recognition And Search Columbia University, Spring 2014
People Search in Surveillance Videos
Traditional Approaches: Face Recognition (“Naming”)
Face recognition is very challenging under lighting changes, pose variation, and low-resolution imagery (typical conditions in surveillance scenarios)
Attribute-based People Search (“Describing”)
[Vaquero et al, Attribute-based People Search in Surveillance Environments, WACV 2009]
Rather than relying on face recognition only, a complementary people search framework based on semantic attributes is provided
Query Example:
“Show me all bald people at the 42nd street station last month with dark skin, wearing sunglasses, wearing a red jacket”
Visual Recognition And Search Columbia University, Spring 2014
People Search in Surveillance Videos
Feris et al, ICMR 2014
Visual Recognition And Search Columbia University, Spring 2014
People Search in Surveillance Videos
Boston Bombing Event “Show me all images of people matching the suspect description from
time X to time Y from all cameras in area Z.”
Ability to spot a person with e.g., a white hat in a crowded scene
Suspect #1 found in 4 images in top 8 results Suspect #2 found in 3 images in top page
1071 detected faces from 50 high-res Boston images (all from Flickr)
Visual Recognition And Search Columbia University, Spring 2014
People Search in Surveillance Videos
People Search based on textual descriptions - It does not require training images for the target suspect.
Robustness: attribute detectors are trained using lots of training images covering different lighting conditions, pose variation, etc.
Works well in low-resolution imagery (typical in video surveillance scenarios)
Visual Recognition And Search Columbia University, Spring 2014
People Search in Surveillance Videos
[Siddiquie, Feris and Davis, “Image Ranking and Retrieval Based on
Multi-Attribute Queries”, CVPR 2011]
Modeling attribute correlations
Visual Recognition And Search Columbia University, Spring 2014
MugHunt Demo
http://mughunt.securics.com/
Visual Recognition And Search Columbia University, Spring 2014
Whittle Search
Slide credit: Kristen Grauman
Visual Recognition And Search Columbia University, Spring 2014
Whittle Search Check Whittle Search demo at: http://godel.ece.vt.edu/whittle/
Visual Recognition And Search Columbia University, Spring 2014
Resources
http://rogerioferis.com/VisualRecognitionAndSearch2014/Resources.html