Automatic Image Annotation (AIA)

Seminar Report Presented to:

Dr. Shanbehzadeh

Presented by: Farzaneh Rezaei

November 2015

What is the goal of computer vision ?

Perceive the story behind the picture

See the world!!But what exactly does it mean to see?Source: Wall-e Movie: Pixar, Walt Disney Pictures

Outline

Introduction To Image

Annotation

• What?• Why?

Story Behind AIA

• Components of AIA• Progress of AIA• Issues &

Conclusions

Going deeper !

• Feature Extraction• Learning Methods• Deep Learning• Conclusions

Useful Information

• Recent Articles• Toolbox• Databases• Authors

Conclusions

• References

Outline

Annotation

• What?• Why?

Story Behind AIA

Conclusions

Going deeper !

Useful Information

Conclusions

• References

What is Automatic Image Annotation?Automatic image annotation is the task of automatically assigning words to an image that describe the content of the image.

Munirathnam Srikanth, et al. Exploiting ontologies for automatic image annotation

Source: Personalizing Automated Image Annotation Using Cross-Entropy: https://ivi.fnwi.uva.nl/isis/publications/bibtexbrowser.php?key=LiICM2011&bib=all.bib

What is Automatic Image Annotation?(Cont.)

Source: MS COCO Captioning Challenge: http://mscoco.org/dataset/#captions-challenge2015

3,000 Photos Are Uploaded Every Second to Facebook

Why Image Annotation is important?Recently, we have witnessed an exponential growth of user generated videos and images, due to the booming of social networks, such as Facebook and Flickr.

Source: petapixel.com

Source: http://petapixel.com/2012/02/01/3000-photos-are-uploaded-every-second-to-facebook/

Why Image Annotation is important?(Cont.)

Source: Barriuso, A., & Torralba, A. (2012). Notes on image annotation

• Applications e.g. Photo organizer apps• Image Classification Systems

Numbers of articles per year for “Automatic Image Annotation”

(in Title of article)

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 20150

Year Reported by: Google Scholar

Outline

Annotation

• What?• Why?

Story Behind AIA

Conclusions

Going deeper !

Useful Information

Conclusions

• References

How do you annotate these images?

What are components of

Automatic Image Annotation

System ?

How to classify Images ?

System ?

Feature Extraction

ClassificationMethods

System ?

Feature Extraction

System ?

Feature Extraction

Pattern Recognition !!

Slide Credit

An Example of classical approaches in AIA

Source: Zhang, D., Islam, M. M., & Lu, G. (2012). A review on automatic image annotation techniques. Pattern Recognition, 45(1), 346–362. doi:10.1016/j.patcog.2011.05.013

Theoretical Limitations of Shallow Architectures*

Functions that can be compactly represented by a depth k architecture

might require an exponential number of computational elements to

be represented by a depth k − 1 architecture

Issues of classical approaches

*Bengio, Y. (2009). Learning Deep Architectures for AI. Foundations and Trends® in Machine Learning

Issues of classical approaches (Cont.)Theoretical Limitations of Shallow Architectures

• Shallow? Deep?

• Functions?

• Compact?

• Depth?

• Computational Elements?

logic circuit

Issues of classical approaches (Cont.)

Picture Source: Bengio, Y. (2009). Learning Deep Architectures for AI. Foundations and Trends® in Machine Learning

Depth 4 Depth 3

• Linear regression and logistic regression have depth 1, i.e., have a single level.

• Ordinary multi-layer neural networks With the most common choice of one hidden

layer, they have depth two

• Decision trees can also be seen as having two levels

• Boosting (Freund & Schapire, 1996) usually adds one level to its base learners: that

level computes a vote or linear combination of the outputs of the base learners

• Shallow? Deep?

• Functions

• Compact

• Depth

• Computational Elements

Theoretical Limitations of Shallow Architectures*

Functions that can be compactly represented by a depth k architecture

might require an exponential number of computational elements to

be represented by a depth k − 1 architecture

Issues of classical approaches

*Bengio, Y. (2009). Learning Deep Architectures for AI. Foundations and Trends® in Machine Learning

• A two-layer circuit of logic gates can represent any boolean function (Mendelson,

1997).

• With depth two logical circuits, most boolean functions require an exponential

number of logic gates (Wegener, 1987) to be represented (with respect to input size)

• There are functions computable with a polynomial-size logic gates circuit of depth k

that require exponential size when restricted to depth k − 1 (Hastad, 1986) The proof

of this theorem relies on earlier results (Yao, 1985) showing that d-bit parity circuits

of depth 2 have exponential size

• One might wonder whether these computational complexity results for boolean circuits are

relevant to machine learning.

• See Orponen (1994)!

• for an early survey of theoretical results in computational complexity relevant to learning

algorithms. Interestingly, many of the results for boolean circuits can be generalized to

architectures whose computational elements are linear threshold units (also known as

artificial neurons (McCulloch & Pitts, 1943)), which compute:

f(x) = w0 x+b≥0 (1)

with parameters w and b.

1 Theoretical Limitations of Shallow Architectures

2 Theoretical Advantages of Deep Architectures

Which one ?? !

Slide Credit

How to assign a word to an image ?

System ?

Feature Extraction

Components of AIA

Classical or Shallow

Structure Issues

31http://graffiti-artist.net/corporate-offices/ny-facebook-office-graffiti/

Outline

Annotation

• What?• Why?

Story Behind AIA

Conclusions

Going deeper !

• Feature Extraction• Learning Methods• CNN• Conclusions

Useful Information

Conclusions

• References

Going Deeper!• Color• Texture• Shape• Segmentation

Feature Extraction &

Representation

• ANN• SVM• Bayes• Metadata

Learning Methods

Feature Extraction

ColorHistogram

Color Moments

Color Coherence

Vector

Color Correlogra

m Scalable Color

Descriptor

Color Structure Descriptor

Dominant Color

Descriptor

Spatial• Statistical• Structural• Model-basedSpectral• FT, DCT,

Wavelet, ..Texture

Color: ComparisonsColor method Pros Cons

Histogram Simple to compute, intuitive High dimension, no spatial info,sensitive to noise

CM Compact, robust Not enough to describe all colors, no spatial info

CCV Spatial info High dimension, high computation cost

Correlogram Spatial info Very high computation cost, sensitive to noise, rotation and scale

Color: Comparisons (Cont.)Color method Pros Cons

DCD Compact, robust,perceptual meaning

Need post-processing for spatial info

CSD Spatial info Sensitive to noise, rotation and scale

SCD Compact on need,scalability

No spatial info, less accurate ifcompact

Spatial Texture : ComparisonsColor method Pros Cons

Texton Intuitive Sensitive to noise, rotation and scale, difficult to define textons

GLCM based method Intuitive, compact, robust High High computation cost, not enough to describe all

Tamura Perceptually meaningful Too few features

SAR Compact, robust, rotationinvariant

High computation cost, difficult to define pattern size

FD Compact, perceptually meaningful computation cost, sensitive to scale

Spectral Texture : Comparisons (Cont.)Color method Pros Cons

FT/DCT Fast computation Sensitive to scale and rotation

Wavelet Fast computation, multi-resolution Sensitive to rotation, limitedorientations

Gabor Multi-scale, multi-orientation, robust

normalisation, losing of spectral information due to incomplete cover of spectrum plane

Curvelet Multi-resolution, multi-orientation, robust

Need rotation normalisation

Chart Source: [Zhang and Lu 2004]

Chart Source: [M. Yang, K. Kpalma, J. Ronsin 2008]

Shape (Cont.)

Contour Based

Calculate shape features only from the boundaryof the shape

Region Based

Extract features from the entire

region

Shape (Cont.)• Because contour based techniques are more sensitive to noise than

region based techniques.• Therefore, color image retrieval usually employs region based shape

features.

Learning Methods:

Learning Methods• SVM• ANN• Tree• Parametric• Non-Parametric

Learning Methods: ComparisonsAnnotation method Pros Cons

SVM Small sample, optimal class boundary, non-linear classification

Single labelling, one class per time, expensive trial and run, sensitive to noisy data, prone to over-fitting

ANN Multiclass outputs, non- linear classification, robust to noisy data, suitable for complex problem

Single labelling, sub-optimal, expensive training, complex and black box classification

DT Intuitive, semantic rules, multiclass outputs, fast, allow missing values, handle both categorical and numerical values

Single labelling, sub-optimal, need pruning, can be unstable

Learning Methods: ComparisonsAnnotation method Pros Cons

Non-parametric Multi-labelling, model free, fast Large number of parameters, large sample, sensitive to noisy data

Parametric Multi-labelling, small sample, good approximation of unknown distribution

Predefined distribution, expensive training, approximated boundary

Metadata Use of both textual and visual features

Difficult to relate visual features with textual features, difficult textual feature extraction

Deep Learning• Deep belief networks• Deep Boltzmann machines• Deep Convolutional neural networks• Deep Recurrent neural networks• Hierarchical temporal memory

Source: https://en.wikipedia.org/wiki/List_of_machine_learning_concepts

Deep Learning (Cont.)

Source: Ranzato, 4 October 2013, Slides

Deep Learning (Cont.)

•A Potential Problem with Deep Learning *??•Optimization Task• See : • Bengio’s Articles!• Hot videos about Deep Learning on YouTube!• Ranzato, 4 October 2013:• https://www.youtube.com/watch?

v=clgMTk5V2Sk*: Ranzato, 4 October 2013, Slides

Outline

Annotation

• What?• Why?

Story Behind AIA

Conclusions

Going deeper !

Useful Information

Conclusions

• References

2009, Shallow

Source: Venkatesh N. Mur thy, S. Maji, R. Manmatha, Automatic Image Annotation using Deep Learning Representations 2015

Useful Information: Recent Articles

Which one ?? !

1 Theoretical Limitations of Shallow Architectures

2 Theoretical Advantages of Deep Architectures

Source: B. Klein, G. Lev, G. Sadeh, and L. Wolf, Fisher Vectors Derived from Hybrid Gaussian-Laplacian Mixture Models for Image Annotation 2015

Useful Information: Recent Articles (Cont.)

Useful Information: Toolbox

MatConvNet• MatConvNet is a MATLAB toolbox

implementing Convolutional Neural Networks (CNNs) for computer vision applications. It is simple, efficient, and can run and learn state-of-the-art CNNs. Several example CNNs are included to classify and encode images.

Caffe• Caffe is a deep learning framework made with

expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Center (BVLC) and by community contributors.Yangqing Jia created the project during his PhD at UC Berkeley. Caffe is released under the BSD 2-Clause license.

Useful Information: Databases

an important benchmark for keyword based image retrieval and image annotation5000 images manually annotated with 1 to 5 keywords. The vocabulary contains 260 words.

Corel5k:This data set is obtained from an online game where two players, that can not communicate outside the game, gain points by agreeing on words describing the image

ESP Game:This set of 20.000 images accompanied with descriptions in several languages was initially published for cross-lingual retrieval

IAPR TC12:

Useful Information: Databases• Other Databases:• Flicker8,10,30

Table Source: M. Guillaumin, T. Mensink, J. Verbeek and C. Schmid, TagProp: Discriminative Metric Learning in Nearest Neighbor Models for Image Auto-Annotation

Useful Information: Authors

Cordelia Schmid• Research director INRIA• Computer vision, object recognition,

video recognition, learning

Li Fei-Fei• Professor, Stanford University• Artificial Intelligence,

Machine Learning, Computer Vision, Neuroscience

Yoshua Bengio• Professor, U. Montreal, Computer Sc.• Machine learning, deep learning,

artificial intelligence

Reported by: Google Scholar

Useful Information: Authors (Cont.)

Richard Socher• MetaMind• deep learning, machine learning,

natural language processing, computer vision

Recursive Deep Learning for Natural Language Pro

cessing and Computer Vision

PhD Thesis, Computer Science Department,

Stanford University

2014 Arthur L. Samuel Best Computer Science PhD

Thesis Award

Reported by: Google Scholar

Outline

Annotation

• What?• Why?

Story Behind AIA

Conclusions

Going deeper !

Useful Information

Conclusions

• References

How to assign a word to an image ?

System ?

Feature Extraction

Components of AIA

Classical or Shallow

Structure Issues

Conclusions !!!

1. High dimensional feature analysis2. How to build an effective annotation model?3. The third issue is that currently annotation and

ranking are done online simultaneously in the multiple labelling annotation approaches. This is not efficient for image retrieval.

4. Lack of standard vocabulary and taxonomy.5. There is no commonly acceptable image database6. insufficient depth of architectures, and locality of

estimators[Bengio, 2009]

Picture Source: Bengio, Y. (2009). Learning Deep Architectures for AI. Foundations and Trends® in Machine Learning

Source: Zhang, D., Islam, M. M., & Lu, G. (2012). A review on automatic image annotation techniques. Pattern Recognition, 45(1), 346–362. doi:10.1016/j.patcog.2011.05.013

Conclusions (Cont.)

References

Automatic Image Annotation (AIA)

Education

An annotation dataset facilitates automatic annotation of ... · An annotation . dataset facilitates. automatic annotation of whole-brain activity imaging of . C. elegans. Author

ACE (Automatic Content Extraction) Chinese Annotation ... · ACE (Automatic Content Extraction) Chinese Annotation Guidelines for Entities Version 5.5 20050505 ... 3.2.3 Organization

DocRicher: An Automatic Annotation System for Text ... › resume › docricher.pdf · DocRicher: An Automatic Annotation System for Text Documents Using Social Media Qiang Hu ∗1,

Semi-Automatic Multi-Object Video Annotation Based on …doras.dcu.ie/23727/1/PID6036535_Semi_Automatic_Multi... · 2019-09-10 · Semi-Automatic Multi-Object Video Annotation Based

ONEMercury: Towards Automatic Annotation of Environmental Science Metadata

Automatic Music Annotation - University of California, San ...cseweb.ucsd.edu/~elkan/254spring05/Turnbull.pdf · Automatic Music Annotation A Research Exam by Douglas Turnbull Department

ACE (Automatic Content Extraction) English Annotation Guidelines … · English-Entities-Guidelines.doc V5.6.6 2005.08.01 1 ACE (Automatic Content Extraction) English Annotation Guidelines

Automatic Figured Bass Annotation Using the New Bach

Automatic tag recommendation for metadata annotation using

Automatic Annotation of Learning Materials for E-learningsudeshna/devshrithesis.pdf · Automatic Annotation of Learning Materials for E-learning 1.1 Introduction The wide availability

Object-based Tag Propagation for Semi-Automatic Annotation

AN AUTOMATIC SONG ANNOTATION SYSTEM BY IRENE …

Automatic Image Annotation

AixOx, a multi-layered learners corpus: automatic annotation

Automatic annotation in UniProtKBeducation.expasy.org/cours/SIB_UniProtKB_2020/Theory2.pdf · Automatic annotation in UniProtKB Protein sequence databases and sequence annotation

ACE (Automatic Content Extraction) Chinese Annotation ... · ACE (Automatic Content Extraction) Chinese Annotation Guidelines for Entities Version 5.5 20050505 Linguistic Data Consortium

Validating Automatic Semantic Annotation of Anatomy · PDF fileValidating Automatic Semantic Annotation of Anatomy in ... Only the leaves with highest localization con ... A coronal

Annotation of anaphora and coreference for automatic processing

Semi-Automatic Annotation for Visual Object Tracking

Subtopic annotation and automatic segmentation for news ...mtaboada/docs/publications/Cardoso_Pardo... · Subtopic annotation and automatic segmentation for ... (NLP), such as automatic