Contenxt 100407

© Ramesh Jain

Ramesh Jain (with Pinaki Sinha and other collaborators)

Department of Computer Science University of California, Irvine

[email protected]

Contenxt: Bridging the Semantic Gap

© Ramesh Jain

Football Highlight System: Automatic Segmentation

15 College teams All games – 4 cameras 30 minutes after the game

© Ramesh Jain

Find Mubarak Shah

© Ramesh Jain

Image Search: Ramesh Jain

© Ramesh Jain

Gives some details

© Ramesh Jain

Tells me who may not be the ‘Real’

© Ramesh Jain

Finds people who are my friends

© Ramesh Jain

Image Search: Finds activities

© Ramesh Jain

My current research

  EventWeb   Connecting and accessing Events

  From Twitter, Facebook   From Web cams, Planetary Skin, …

  Connecting environments   Personal Media Management

  Images, Video, Text, …   Doing Computer Vision Correctly

© Ramesh Jain

Computer Vision

  Computer vision is the science and technology of machines that see.

  As a scientific discipline, computer vision is concerned with the theory behind artificial systems that extract information from images.

From Wikipedia.

© Ramesh Jain

How do you Search for Images?

  Use a Content-based Image retrieval engine from XYZ University?

  Use a Content Based Image Search Engine from a company?   Is there any?   I tried to do one in 1994 and built Virage as a

result – but …..   Or do you use just a ‘text’ search engine?

© Ramesh Jain

Text Search Engines

  How good are text search engines in object recognition?

  Lets Look at some real working systems by searching for people here.

© Ramesh Jain

Disruptive Stages in Computing:1

Data: Numbers, Text,

Statistics, Sensors (Video)

Data (Computation)

© Ramesh Jain

Computing 1: Data

  Mainframe and workstations   Main applications:

  Scientific and engineering   Business

  Users:   Sophisticated   Expected to be trained

  Dominant Technology   Computing

© Ramesh Jain




Data (Computation)

Information: Search, Specialized sources

Information (Communication)

© Ramesh Jain

Computing 2: Information   PC and Internet   Main applications:

  Information   Communication

  Users:   Common people in ‘developed world’   Easy access using keyboards

  Dominant Technology   Authoring tools   Access mechanisms   Sharing

© Ramesh Jain

What Next?




Data (Computation)

Information: Search, Specialized sources

Information (Communication)

Experience: Direct observation or

participation

Experience (Insights)

© Ramesh Jain

Computing 3: Experience

  Experiential devices: Mobile phones   Main applications:

  Experience management   Experiential communication

  Users:   Humans   No language issues

  Dominant Technology   Sensor understanding

  Vision and audio will be dominant

© Ramesh Jain

© Ramesh Jain

© Ramesh Jain

The Challenge

Connecting

© Ramesh Jain

Bits and Bytes

Alphanumeric Characters

Lists, Arrays, Documents, Images …

Transforma)ons

© Ramesh Jain

Semantic Gap

The semantic gap is the lack of coincidence between the information that one can extract from the (visual) data and the interpretation that the same data have for a user in a given situation. A linguistic description is almost always contextual, whereas an (image) may live by itself.

Content-Based Image Retrieval at the End of the Early Years Found in: IEEE Transactions on Pattern Analysis and Machine Intelligence Arnold Smeulders , et. al., December 2000

© Ramesh Jain

Data Information Experience

© Ramesh Jain

Rohrsach Test

  use this test to examine a person's personality characteristics and emotional functioning

© Ramesh Jain

Falling Tree and George Berkeley   "If a tree falls in a forest and no one is

around to hear it, does it make a sound”   "No. Sound is the sensation excited in the

ear when the air or other medium is set in motion.“

  Observation, Reality, and Perception.

© Ramesh Jain

Context

  - text surrounding word or passage: the words, phrases, or passages that come before and after a particular word or pas…

  - surrounding conditions: the circumstances or events that form the environment within which something exists or takes …

  - data transfer structure: a data structure used to transfer electronic data to and from a business management system

© Ramesh Jain

Content

  - amount of something in container: the amount of something contained in something else

  - subject matter: the various issues, topics, or questions dealt with in speech, discussion, or a piece of writing

  - meaning or message: the meaning or message contained in a creative work, as distinct from its appearance, form, or style

© Ramesh Jain

The Story of Computer Vision

The Psychology of Computer Vision (McGraw-Hill Computer Science Series), 1975.

Marvin Minsky and the summer project to solve computer vision.

© Ramesh Jain

D.L. Waltz, Understanding Line Drawings of Scenes with Shadows.

© Ramesh Jain

MSYS: A System for Reasoning about Scenes

Harry Barrow and Martin Tenebaum

April 1976

© Ramesh Jain

MSYS: Relational Constraints

© Ramesh Jain

Relaxation labelling algorithms — a review J Kittler and J Illingworth

Image and Vision Computing Volume 3, Issue 4, November 1985, Pages 206-216

Abstract An important research topic in image processing and image interpretation methodology is the development of methods to incorporate contextual information into the interpretation of objects. Over the last decade, relaxation labelling has been a useful and much studied approach to this problem. It is an attractive technique because it is highly parallel, involving the propagation of local information via iterative processing. The paper. surveys the literature pertaining to relaxation labelling and highlights the important theoretical advances and the interesting applications for which it has proven useful.

© Ramesh Jain

Serge Belongie and Co-Researchers

  semantic context (probability),   spatial context (position)   and scale context (size).

© Ramesh Jain

Modeling the World

  Data (Semantic Web)   Objects (Search Companies, …)   Events (Relationships among objects and

attributes)

Both Objects and Events are essential to model the world.

© Ramesh Jain

Events

  Take place in the real world.   Captured using different sensory

mechanism.   Each sensor captures only a limited aspect of the

event.   Are used to understand a Situation.

© Ramesh Jain

What is in an Event?

© Ramesh Jain

Tim

e

1- dimensional Space Events

© Ramesh Jain

History: Gopher to Google

  We had Internet.   Lots of computers were connected to each other.   Computers had files on them.   We had GOPHER and other FTP mechanisms.

© Ramesh Jain

Tim Berners-Lee thought:

  Suppose all the information stored on computers everywhere were linked.

  Suppose I could program my computer to create a space in which anything could be linked to anything.

Others – including Bush -- had that idea earlier but the technology was not ready.

© Ramesh Jain

That resulted in the Web

  DocumentWeb   Each node is a ‘Page’ or a document.   Pages are linked through explicit referential links

© Ramesh Jain

Then Came Google, Facebook, Twitter

  Search   Maps   …   Social Network   Events

  Twitter   Status Updates   Eventful

© Ramesh Jain

Evolution of Search

  Alphanumeric structured data: Databases   Information Retrieval   Search   Multimedia Search   Real Time Search (Event Search)

  Will lead to identifying situations

© Ramesh Jain

Continuing the Evolution of the Web

  Consider a Web in which each node   Is an event   Has informational as well as experiential data   Is connected to other nodes using

  Referential links   Structural links   Relational links   Causal links

  Explicit links can be created by anybody   This EventWeb is connected to other Webs.

© Ramesh Jain

Connectors

  My 5 Senses are connectors between ‘me’ and the world.

  We use our sensors (vision, audio, …) to experience the world.

  Sensors could be the interface between the Cyberspace and the Real World.

  Sensors are placed for ‘detecting events’.   How do you decide what sensors to put at any

place?   Would you put a sensor if nothing interesting

ever happens at a place?

© Ramesh Jain

From Atomic Events to Composite Events

  Spatial and Temporal aggregation   Assimilation   Composition

  Using sophisticated models   Ontolgical models could be used   May include causality

© Ramesh Jain

Tim

e

1- dimensional Space EventWeb

© Ramesh Jain

Types of Context

  Relationship among different objects and even in their subparts in real world

  Environmental parameters of the digital devices at the time of photo taking

  Knowledge about the person taking photos and even of the person Interpreting photo

  Real world situation in which the data is interpreted

© Ramesh Jain

Context Starts much Before the Photo is Taken

  Where   When   Why   Who (Photographer)   Which device   Parameters of the device

© Ramesh Jain

Modern Cameras   Are more than ‘Camera Obscura’: They capture an

event.   Many sensors capture scene context and store it along

with intensity values.   EXIF data is all metadata related to the Event.

Exposure Time Aperture Diameter Flash Metering Mode ISO Ratings Focal Length

Time Location (soon) Face

© Ramesh Jain

Sony CyberShot DSC-T2 Touchscreen 8MP Digital Camera with Smile Detection

© Ramesh Jain

Information in a Digital Photo

Exposure Time, Focal Length, Aperture, Flash, ISO Ratings Date, Time, Time Zone

Latitude, Longitude

Voice Tags, Preset Modes, Ontology etc

© Ramesh Jain

Experiential Media Management Environment

  Event-based   Should be able to deal with ‘multimedia’

  Photos   Audio   Video   Text   Information and data   …

  Searching based on events and media.   Storytelling

© Ramesh Jain

EMME Event Cycle

Event Base

Event Presentation/ Navigation

Event Grouping, Linking, Assimilation

Atomic Event Entry

EXIF

Features

Tags/ Context

Photo stream Segment. Event

Ontology

User Annot- ations

Story Telling

Search

Explore

© Ramesh Jain

Using EMME   Searching for photo

  ACM MM 2009   Creating Albums:

  Professional   Family   Tourism

  Telling stories   What did I do in Beijing?

  Scenario: In December 2009, I have 20,000 pictures taken in 2008. How do I (semi-automatically) select 25 to send to   My mother   The uncle that I hate   My personal friend   My professional friend   …

© Ramesh Jain

Contenxt Content Context

  Contenxt = Content + Context

  Context is as powerful, possibly more, as content in understanding audio-visual information

© Ramesh Jain

Examples of Photos from the Unsupervised Clusters: High Exposure Time, Small Aperture

© Ramesh Jain

Examples of Photos from the Unsupervised Clusters:

Low Aperture (High DOF), Low FL (Wide Angle)

© Ramesh Jain

Examples of Photos from the Unsupervised Clusters: High Aperture (Low DOF), High FL (Telephoto)

© Ramesh Jain

Examples of Photos from the Unsupervised Clusters: Photos with Flash: Indoor shots

© Ramesh Jain

Examples of Photos from the Unsupervised Clusters:

Photos with Flash: Darker Outdoors

© Ramesh Jain

Photos can be tagged using only EXIF!

© Ramesh Jain

Guess the Tags!!

Using Image Features Only:

Scenery, City Streets, Illuminations, People Posing for Photo, Wildlife.

Using Optical Parameters:

Single Person Indoors, Portraits, Party Indoors, People at Dinner.

© Ramesh Jain

Confusing Background !!

Predicted Tags:

Using Image Features Only:

Scenery

City Streets

People Posing Outdoors

Group Photo Indoors

Wildlife

Using Optical Metadata and Thumbnail Features:

Group Photo Indoors

Single Person Indoors

Indoor Party

Indoor Artifact

Illuminations

Guess The Tags!!

© Ramesh Jain

Automatic Annotation

  Use both Content and Optical Context

  How to Combine them?   Are the Optical Context Really Useful for

Annotation?   What should be the nature of annotations?

  Grass, sky, …   People, animals, …

© Ramesh Jain

More on Exif Related Experiments For Photo Tagging

  Build models separately for Point-and-Shoots vs SLR cameras since their optical parameters vary a lot.

  Do rigorous experiments using the same dataset (NUS WIDE or MIR Flickr) to find how content based classifiers compare with context based classifiers.

  How much do we gain by including both.

© Ramesh Jain

Personal-Photo-EventWeb

© Ramesh Jain

Singapore – Outdoor -- People

© Ramesh Jain

People-No Face - Outdoor

© Ramesh Jain

Sharing Photos

  Taking photos is (almost) zero cost.   People now ‘Shoot first – see later’.

  Let me share 344 photos that I took yesterday with you.   Here   On Flickr   On Facebook

  Tweeting cameras

$12.30 At Amazon.com

This is a serious problem now. Today.

© Ramesh Jain

I want to share, but …

  Flickr Problem   Facebook

© Ramesh Jain

Our Solution: Photo Summarization

  Many TYPES of Summaries to choose from:   Time/ Face Based   Image Feature Based

  Applications   Sharing with friends without making them enemy   Uploading to your favorite sites   Selecting exemplar photos for printing   Refreshing your memory   Photo frames

  Soon will be available on your camera.

© Ramesh Jain

Technical Specifications:   Uses and extends state of art

  EXIF   GIST Features   Faces   Color Histograms   Affinity Propagation Algorithm

  Performance: Great!   Very Intuitive   Very fast

  Human in the Loop: Fine Tuning   We believe – You are the BOSS

© Ramesh Jain

Photos Summarization

© Ramesh Jain

Original Data Set

© Ramesh Jain

Photo-Summarization using content

© Ramesh Jain

Photo-Summarization using Faces

© Ramesh Jain

Using Contenxt to find Unique People in Photostreams from Multiple People in an Event

© Ramesh Jain

Step 1: Detect Faces Across All Photostreams Step2: Detect Clothing Across all Photostreams

Step3: Cluster Clothing Based on Color Step 4: Find Unique Faces within each Clothing

Cluster Step 5: Iterate through 3-4 by refining the parameters to get a unique set of people.

Using Clothing + Face Feature (Contenxt)

© Ramesh Jain

Clothing Cluster 1 with corresponding Faces

© Ramesh Jain

Unique Faces in Cluster 1: (each row is one person)

© Ramesh Jain


© Ramesh Jain


© Ramesh Jain


© Ramesh Jain


© Ramesh Jain

Conclusions and Future research

  Content (data) is important for computer vision.

  Context is more important than content for solving real (and hard) problems in vision.

  Real success is only possible by using ConteNXt.

© Ramesh Jain

Thanks.

For more information,

[email protected]

?

Education

Contenxt 100407