Where-What Network 3 (WWN-3): Developmental Top-Down ...people.idsia.ch/~luciw/slides/IJCNN2010.pdfMichigan State University 2 Attention with Multiple Contexts •What’s the foreground?

Where-What Network 3 (WWN-3):

Developmental Top-Down Attention

for Multiple Foregrounds and

Complex Backgrounds

Matthew Luciw

www.cse.msu.edu/~luciwmat

Juyang Weng

www.cse.msu.edu/~weng

Embodied Intelligence Lab

www.cse.msu.edu/ei

http://www.cse.msu.edu/~luciwmat

http://www.cse.msu.edu/~weng

http://www.cse.msu.edu/ei

Michigan State University 2

Attention with Multiple Contexts

• What’s the

foreground?

• “Find the person in

a pink shirt and red

hat.”

• “Find the person in

a brown shirt and

gray hat.”


General Purpose Attention and Recognition

• Major Issues

• Complex backgrounds

• Binding problem

• Lack of constraints

• Chicken-egg problem

• Recognition requires segmentation

• Segmentation requires recognition

• Remains an open problem

Problems with Many Attention-

Recognition Methods • Not utilizing top-down feedback simultaneously

with bottom-up, at each layer

• As in biological visual systems

• Deal with multiple contexts

• Learning better features

• Border ownership and transparency

• Not developmental

• Using pre-selected rules to find interest points (e.g.,

corner detection)

• Detect and recognize pre-selected objects

• Pre-selected architecture (e.g., # layers and neurons)


Some Other Approaches

• Feature Integration Theory (Treisman 1980): a master map for location?

• Saliency-based (Itti et al. 1998): feature types pre-selected • Bottom-up: traditional; top-down: gain-tuning (Backer et al. 2001)

• Shift circuits (Anderson & Van Essen 1987, Olshausen et al. 1993): how were they developed?

• SIFT: requiring pre-selected rule for interest points

• Top-down in connectionist models • Visual search and label-based top-down: (Deco & Rolls, 2004): no top-

down in training

• Selective tuning (Tsostos et al. 1995) using inhibitory top-down

• ARTSCAN (Fazl & Grossberg, 2007): excitatory top-down, form fitting ``attentional shroud’’ --- potential difficulty with complex backgrounds

Visual System: Rich Bidirectional Connectivity

• Coritcal area connectivity e.g., as seen in Felleman and Van

Essen’s study (1993)…

• But… this seems too complicated to model?

Evidence that Areas are Developed from Statistics

• (1): Orientation selective neurons: internal representation

• (2): Blakemore and Cooper: representation is experience dependent

• (3): M. Sur: Input-driven self-organization and feature development

• Suggests functional representation is not hardcoded, but developed

Consider a Single Area with

Bottom-Up and Top-Down

V

(bottom-up weight matrix)

M

(top-down weight matrix)

For a single neuron:

Multilayer Bidirectional Where

What Networks

WWN-3

• Two way information flow in both

training and testing

• Different information flow

parameterizations allow different

attention modes (i.e., what-

imposed, where-imposed)

• No hardcoded rules for interest

points or features: each area learns

through Lobe Component Analysis

Each Layer: Lobe Component Analysis

LCA incrementally approximates joint distribution of bottom-up + top-down, in

a dually optimal way

LCA used for learning bottom-up and top-down weights in each area

• Weng & Zheng, WCCI, 2006

• Weng & Luciw, TAMD, 2009

Learned

Prototypes

(Above): Example training

images, from 5 classes

with 3 rotation variations

in depth

Location and Type are

imposed at the motors

Right: response-weighted

input of a slice of V4:

shows bottom-up

sensitivities

Current object

representation pathway is

limited

Learned Features in IT and PP

IT spatial representation

(a): IT learned type-specific

(here: duck) but allows location

variations: we show response-

weighted input of 4 single

neurons here

(b): PP learned location-specifc

but allows type variation

These effects are enabled by top-

down connections in training

Response of a Layer

• V: bottom-up weights

• M: top-down weights

• f: lateral inhibition (approximation), k --- number of nonzero firing units

• rho: relative influence of bottom-up to top-down

• g: activation function e.g., sigmoid, tanh, linear

WWN Operates Over Multiple Contexts

V4: “Find the Cat” From IT (PP

has a low

weight in

search tasks)

To IT and PP

(right): bottom-up response

(below): top-k (40)

Integration of bottom-up

and top-down

Type: cat

imposed at

motor

Top-k (4):

V4: “Find the Pig”

From IT

To IT and PP

(right): bottom-up response

(below): top-k (40)

Integration of bottom-up

and top-down

Type: pig

imposed at

motor

Top-k (4):

Attentional Context

Performance Over Learning

Disjoint views used in

testing

Performance with Multiple Objects

Future: Multimodal SASE

• The SASE (self-aware and self-effecting) architecture

describes a highly recurrent architecture of a multi-sensor,

multi-effector brain. Multi-sensory and multi-effector

integration are achieved through developmental learning.

Conclusions

• Novel methods on utilizing top-down excitatory

connections in multilayer Hebbian networks

• Top-down connections in WWN-3

• Top-down attention and recognition without a master map or

internal ``canonical views’’ (combination neurons)

• Multilayer synchronization

• Top-down context switching based on an internal idea or

external percept

• Hopefully contributes to the foundations of online

learning based in cortex-inspired methods


Thank You

• Questions


Future: Synaptic Neuromodulation

• Background has high

variation, foreground

has low variation

• Automatic receptive field

learning for larger

recognition hierarchies

(e.g., V1 <-> V2 <-> V4)

Documents

Where-What Network 3 (WWN-3): Developmental Top-Down ...people.idsia.ch/~luciw/slides/IJCNN2010.pdfMichigan State University 2 Attention with Multiple Contexts •What’s the foreground?