Upload
others
View
9
Download
0
Embed Size (px)
Citation preview
Where-What Network 3 (WWN-3):
Developmental Top-Down Attention
for Multiple Foregrounds and
Complex Backgrounds
Matthew Luciw
www.cse.msu.edu/~luciwmat
Juyang Weng
www.cse.msu.edu/~weng
Embodied Intelligence Lab
www.cse.msu.edu/ei
Michigan State University 2
Attention with Multiple Contexts
• What’s the
foreground?
• “Find the person in
a pink shirt and red
hat.”
• “Find the person in
a brown shirt and
gray hat.”
Michigan State University 3
General Purpose Attention and Recognition
• Major Issues
• Complex backgrounds
• Binding problem
• Lack of constraints
• Chicken-egg problem
• Recognition requires segmentation
• Segmentation requires recognition
• Remains an open problem
Problems with Many Attention-
Recognition Methods • Not utilizing top-down feedback simultaneously
with bottom-up, at each layer
• As in biological visual systems
• Deal with multiple contexts
• Learning better features
• Border ownership and transparency
• Not developmental
• Using pre-selected rules to find interest points (e.g.,
corner detection)
• Detect and recognize pre-selected objects
• Pre-selected architecture (e.g., # layers and neurons)
Michigan State University 5
Some Other Approaches
• Feature Integration Theory (Treisman 1980): a master map for location?
• Saliency-based (Itti et al. 1998): feature types pre-selected • Bottom-up: traditional; top-down: gain-tuning (Backer et al. 2001)
• Shift circuits (Anderson & Van Essen 1987, Olshausen et al. 1993): how were they developed?
• SIFT: requiring pre-selected rule for interest points
• Top-down in connectionist models • Visual search and label-based top-down: (Deco & Rolls, 2004): no top-
down in training
• Selective tuning (Tsostos et al. 1995) using inhibitory top-down
• ARTSCAN (Fazl & Grossberg, 2007): excitatory top-down, form fitting ``attentional shroud’’ --- potential difficulty with complex backgrounds
Visual System: Rich Bidirectional Connectivity
• Coritcal area connectivity e.g., as seen in Felleman and Van
Essen’s study (1993)…
• But… this seems too complicated to model?
Evidence that Areas are Developed from Statistics
• (1): Orientation selective neurons: internal representation
• (2): Blakemore and Cooper: representation is experience dependent
• (3): M. Sur: Input-driven self-organization and feature development
• Suggests functional representation is not hardcoded, but developed
Consider a Single Area with
Bottom-Up and Top-Down
V
(bottom-up weight matrix)
M
(top-down weight matrix)
For a single neuron:
Multilayer Bidirectional Where
What Networks
WWN-3
• Two way information flow in both
training and testing
• Different information flow
parameterizations allow different
attention modes (i.e., what-
imposed, where-imposed)
• No hardcoded rules for interest
points or features: each area learns
through Lobe Component Analysis
Each Layer: Lobe Component Analysis
LCA incrementally approximates joint distribution of bottom-up + top-down, in
a dually optimal way
LCA used for learning bottom-up and top-down weights in each area
• Weng & Zheng, WCCI, 2006
• Weng & Luciw, TAMD, 2009
Learned
Prototypes
(Above): Example training
images, from 5 classes
with 3 rotation variations
in depth
Location and Type are
imposed at the motors
Right: response-weighted
input of a slice of V4:
shows bottom-up
sensitivities
Current object
representation pathway is
limited
Learned Features in IT and PP
IT spatial representation
(a): IT learned type-specific
(here: duck) but allows location
variations: we show response-
weighted input of 4 single
neurons here
(b): PP learned location-specifc
but allows type variation
These effects are enabled by top-
down connections in training
Response of a Layer
• V: bottom-up weights
• M: top-down weights
• f: lateral inhibition (approximation), k --- number of nonzero firing units
• rho: relative influence of bottom-up to top-down
• g: activation function e.g., sigmoid, tanh, linear
WWN Operates Over Multiple Contexts
V4: “Find the Cat” From IT (PP
has a low
weight in
search tasks)
To IT and PP
(right): bottom-up response
(below): top-k (40)
Integration of bottom-up
and top-down
Type: cat
imposed at
motor
Top-k (4):
V4: “Find the Pig”
From IT
To IT and PP
(right): bottom-up response
(below): top-k (40)
Integration of bottom-up
and top-down
Type: pig
imposed at
motor
Top-k (4):
Attentional Context
Performance Over Learning
Disjoint views used in
testing
Performance with Multiple Objects
Future: Multimodal SASE
• The SASE (self-aware and self-effecting) architecture
describes a highly recurrent architecture of a multi-sensor,
multi-effector brain. Multi-sensory and multi-effector
integration are achieved through developmental learning.
Conclusions
• Novel methods on utilizing top-down excitatory
connections in multilayer Hebbian networks
• Top-down connections in WWN-3
• Top-down attention and recognition without a master map or
internal ``canonical views’’ (combination neurons)
• Multilayer synchronization
• Top-down context switching based on an internal idea or
external percept
• Hopefully contributes to the foundations of online
learning based in cortex-inspired methods
Michigan State University 23
Thank You
• Questions
Michigan State University 24
Future: Synaptic Neuromodulation
• Background has high
variation, foreground
has low variation
• Automatic receptive field
learning for larger
recognition hierarchies
(e.g., V1 <-> V2 <-> V4)