Grammar of Image Zhaoyin Jia, 03-30-2009. Problems Enormous amount of vision knowledge: Computational complexity Semantic gap …… Classification,

Grammar of Image

Zhaoyin Jia, 03-30-2009

Problems Enormous amount of vision knowledge:

Computational complexity

Semantic gap

……

20 40 60 80 100 120 140 160

20

40

60

80

100

120

Classification,Recognition

Task of image parsing

Objectives in this paper Framework for vision

And-Or Graph

Algorithm for this framework Top-down/bottom-up computation

Generalization of small sample Use Monte Carlos simulation to

synthesis more configurations

Fill the semantic gap

Grammar Language: co-occurance of s is more than

chance

Image: Parallel; T-junction

( | )1

( | ) ( | )

p s A B

p s A p s B

CONSTANTINOPLE

Formulation of grammar Start symbol: S Non-terminal nodes: VN

Reproduction Rule: R

Terminal nodes: VT



Terminal nodes: VT



Terminal nodes: VTS NP VP

VP VP PPVP V NP

……



Terminal nodes: VT



Terminal nodes: VT

Image grammar Start symbol: S

Reproduction Rules

Non-terminal nodes: VN

Terminal nodes: VT

Overlapping parts/Ambiguity

Similar color, occlusion, etc.

Overlapping parts/Ambiguity

For each VN , we have reproduction rules:

with a probability associated with each one:

Probability of parsing tree:

Probability of sentence:

Stochastic Context Free Grammar

Stochastic Grammar with Context From left to right: bi-gram model (Markov

chain)a sentence with n words:

Non-local relations: tree model

New issues in Image Grammar Loss of “left to right” order: region adjacency

graph

New issues in Image Grammar Scaling makes different terminal in parsing

tree

New issues in Image Grammar Switch between texture and structure

Building the image grammar Visual Vocabulary:

primitives, sketch graph, textons… Relations and configurations:

co-occurance, attached, hinged, supported, occluded…

And-or Graph representationembedding image grammar

Learning /testing the parse graphfind the possible inference

Database Lotus Hill Institute Dataset

636,748 images, 3,927,130 Physical Objects

A few hundred are free

Benjamin Yao, Xiong Yang, and Song-Chun Zhu, “Introduction to a large scale general purpose ground truth dataset: methodology, annotation tool, and benchmarks.” EMMCVPR, 2007http://www.imageparsing.com/

Free Data

6 categories, 145 subsetsManmade Object 75 Nature Object 40 Objects in

Scene 6 Transportation 9 UCLA Aerial Image 5 UIUC Sport

Activity 10

Outline & segmentation of the object

http://yoshi.cs.ucla.edu/yao/data/

Free Data



Activity 10

Segmentation of a scene (street)


Free Data



Activity 10

Physical parts of the object


OBJECT1:truck

OBJECT1:truck PART1:truck:body PART2:truck:windshield PART3:truck:headlight PART4:truck:headlight PART5:truck:headlight PART6:truck:headlight

PART7:truck:rearview mirror PART8:truck:rearview mirror PART9:truck:rear light PART10:truck:window PART11:truck:frontal left wheel PART12:truck:frontal right wheel

PART13:truck:back wheel PART14:truck:back wheel PART15:truck:carriage

Visual Vocabulary The “Lego Land”

Language

Visual Vocabulary

: function of image primitives

: a) geometry transformation

b) appearance

: bond between each primitives

Visual Vocabulary Sketch and Texture

SK NSK SK NSKI I I

S. C. Zhu, Y. N. Wu, and D. B. Mumford, “Minimax entropy principle and its applications to texture modeling,” Neural Computation, vol. 9, no. 8, pp. 1627–1660, November 1997

Primal sketch model

Input image

Sketch graph

Texture pixelsC. E. Guo, S. C. Zhu, and Y. N. Wu, “Primal sketch: Integrating texture and structure,” in

Proceedings of International Conference on Computer Vision,2003.

Primal sketch model

C. E. Guo, S. C. Zhu, and Y. N. Wu, “Primal sketch: Integrating texture and structure,” in Proceedings of International Conference on Computer Vision,2003.

High level visual vocabulary Cloth: collar, left/right sleeves, hands

H. Chen, Z. J. Xu, Z. Q. Liu, and S. C. Zhu, “Composite templates for cloth modeling and sketching,” in Proceedings of IEEE Conference on Pattern Recognition and Computer Vision, New York, June 2006

Relations and configurations Definition of relation:

bonds:relations: , : structure, : compatibility

Three types of relations Bonds and connections Joints and junctions Object interactions/semantics

Definition of configurations:

{( , )}s t S S

{( , ; , ) : , }E s t s t S

,C V E { : ( ( , ; ), ) };i i i iV A A x y

Relations Bonds and connections

connects primitives into bigger graphs

intensity/color compatibility

{ , 1,2,..., , 1, 2,..., ( )}ijS i n j n i

( ) {( , ; , )}bond ij ijE S

( , , )x y

Relations Joint and junctions

Relations Object interactions

Configuration Spatial layout of entities at a certain level

Primal sketch – parts – object – scene

,C V E { : ( ( , ; ), ) };i i i iV A A x y

Reconfigurable graphs Treat bonds as random variables: address

nodes

Inference of the configuration Have the primal sketch of the image Detect the ‘T-junction’ Simulated annealing to infer the Gestalt Law

R. X. Gao and S. C. Zhu, “From primal sketch to 2.1D sketch,” Technical Report, Lotus Hill Institute, 2006

Red dot: connect region

Black line: known edge

Green line: inferred connection

Reconfigurable graphs

Ru-Xin Gao1, Tian-Fu Wu, Song-Chun Zhu, and Nong Sang, “Bayesian Inference for Layer Representation with Mixed Markov Random Field ”

Source image

T-junction

Inferred connection

Layer extractio

n

Reconfigurable graphs

R. X. Gao and S. C. Zhu, “From primal sketch to 2.1D sketch,” Technical Report, Lotus Hill Institute, 2006

And-Or Graph Parse graph of the image

pt: parse tree of vocabulary E: relations Inference the parse graph:

( , )pg pt E

* argmax ( | )pg p pg I

Z. J. Xu, L. Lin, T. F. Wu, and S. C. Zhu, “Recursive top-down/bottom up algorithm for object recognition,” Technical Report, Lotus Hill Research Institute, 2007.

Contain all the valid parse graphs

And node, Or node, leaf-node

Relation between children of And node

Parse tree: assigning label on Or node

And-Or Graph

Z. J. Xu, L. Lin, T. F. Wu, and S. C. Zhu, “Recursive top-down/bottom up algorithm for object recognition,” Technical Report, Lotus Hill Research Institute, 2007.

Definition: image primitives relations at all level : probability model defined on the And-Or

graph : valid configuration of terminal nodes

And-Or Graph, , , , ,and or N TG S V V R P

and orNV V V 1 2, ,...or And AndV V V 1 2| , ,...And Or Or

TV V V V

{ ( , ; ), )TV x y

{( , ; , )}m s t st stR E v v

P

Stochastic Model on And-Or graph Terminal (leaf) node: And-Or node: Set of links: Switch variable at Or-node: Attributes of primitives:

( )T pg

( ), ( )or andV pg V pg

( )E pg

1( ; , , ) exp( ( ))

( )p pg R pg

Z

( ) ( ) ( )

( , ) ( )

( ) ( ( )) ( ( ))

( , , , )

Or andv t

v V pg v V pg T pg

ij i j ij iji j E pg

pg w v t

v v

( )w t

( )t


( )T pg


( )E pg

1( ; , , ) exp( ( ))

( )p pg R pg

Z

( ) ( ) ( )

( , ) ( )

( ) ( ( )) ( ( ))

( , , , )

Or andv t

v V pg v V pg T pg


pg w v t

v v

( )w t

( )t

SCFG: weigh the frequency at the children of or-nodes


( )T pg


( )E pg

1( ; , , ) exp( ( ))

( )p pg R pg

Z

( ) ( ) ( )

( , ) ( )

( ) ( ( )) ( ( ))

( , , , )

Or andv t

v V pg v V pg T pg


pg w v t

v v

( )w t

( )t

Weigh the local compatibility of primitives (geometric and appearance)


( )T pg


( )E pg

1( ; , , ) exp( ( ))

( )p pg R pg

Z

( ) ( ) ( )

( , ) ( )

( ) ( ( )) ( ( ))

( , , , )

Or andv t

v V pg v V pg T pg


pg w v t

v v

( )w t

( )t

Spatial and appearance between primitives (parts or objects)

Learning And-Or Graph

Learning the vocabulary Learning the relation set R, given Learning the parameters , given R and

1( ; , , ) exp( ( ))

( )p pg R pg

Z

( , ) ( )( ) ( ) ( )

( ) ( ( )) ( ( )) ( , , , )Or and

v t ij i j ij iji j E pgv V pg v V pg T pg

pg w v t v v


Learning the vocabulary , and hierarchic And-Or Graph

Learning the relation set R, given Learning the parameters , given R and

1( ; , , ) exp( ( ))

( )p pg R pg

Z

( , ) ( )( ) ( ) ( )

( ) ( ( )) ( ( )) ( , , , )Or and

v t ij i j ij iji j E pgv V pg v V pg T pg

pg w v t v v

Discussed in the paper


Learning and Pursuing Relation Set R: Start from Stochastic

Context Free Graph (a)

Learn the relations that maximally reduce the KL divergence to the observation (b-e)

( , )f I pgObservation:

Learning model:

( ; , , )p pg R

J. Porway, Z. Y. Yao, and S. C. Zhu, “Learning an And–Or graph for modeling and recognizing object categories,” Technical Report, Department of Statistics,2007

Learning graph parameter Approximating to Similar to texture synthesis

( ; , , )p pg R ( , )f I pg

S. C. Zhu, Y. N. Wu, and D. B. Mumford, “Minimax entropy principle and its applications to texture modeling,” Neural Computation, vol. 9, no. 8, pp. 1627–1660, November 1997


Case I: Rectangle Nodes: Rectangle

Two vanishing points, four edge direction

Rules:

F. Han and S. C. Zhu, “Bottom-up/top-down image parsing by attribute graph grammar”. Proceedings of International Conference on Computer Vision, Beijing,China, 2005.

Case I: Rectangle Get the primal sketch of the

scene

Find the ‘strong’ rectangular (bottom-up, red)

Weigh (score) different hypothesis (top-down, blue) Weight is the compatibility of the

image with the proposed rectangular (primal-sketch)

Accept the best one

Do the previous 3 steps until all the weigh is small. (negative)F. Han and S. C. Zhu, “Bottom-up/top-down image parsing by attribute graph grammar”. Proceedings of International Conference on Computer Vision, Beijing,China, 2005.

2

( , )

2

(( ( , ) ( , ))

~ exp( )2

kx y

I x y B x y

Case I: Rectangle Inference process

Case I: Rectangle

F. Han and S. C. Zhu, “Bottom-up/top-down image parsing by attribute graph grammar”. Proceedings of International Conference on Computer Vision, Beijing,China, 2005.

Case II: Human Cloth Use And-Or graph to generate a matching

model

Vocabulary (training dataset)

Matching using the And-

or Graph

Matching using the And-

or Graph

Case II: Human Cloth The And-Or

Graph

H. Chen, Z. J. Xu, Z. Q. Liu, and S. C. Zhu, “Composite templates for cloth modeling and sketching,” in Proceedings of IEEE Conference on Pattern Recognition and Computer Vision, New York, June 2006.

Novel Configuration

Inference process

Case II: Human Cloth


Localize face, then estimate the parts of the body

Bottom-up: a coarse matching of the parts

Top-down: refine the matching using the relation

Case II: Human Cloth Inference result


Case II: Human Cloth Inference result


Hands are not exactly the same: find the best matching in the dataset

Case III: RecognitionZ. J. Xu, L. Lin, T. F. Wu, and S. C. Zhu, “Recursive top-down/bottomup algorithm for object recognition,” Technical Report, Lotus Hill Research Institute, 2007.

Conclusion Enormous amount of vision knowledge: (Add-Or graph)

……

Conclusion Computational complexity :

Remain open for scheduling bottom-up/top-down procedure

Semantic Gap Learning the And-Or Graph Learning the vocabulary , and its attributes

After all, we are not supposed to define so many things:

ideal vision words:

what we have now:

20 40 60 80 100 120 140 160

20

40

60

80

100

120

Thank you

Zhaoyin Jia

Documents

Grammar of Image Zhaoyin Jia, 03-30-2009. Problems Enormous amount of vision knowledge: Computational complexity Semantic gap …… Classification,