Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Top-down Neural AttentionS E L E C T I V E AT T E N T I O N F R O M A
D E E P N E U R A L N E T
1
General's Family by Octavio Ocampo
Background
2
Understanding Artificial Neural Networks
© Jianming Zhang, derivative work. Original image credit: soul wind / stock.adobe.com
Problem Definition
3
Deep CNN
• animal• elephant• zebra• grass• africa
elephant
Top-down Attention Map Top-down Signal
Probabilistic Winner-Take-All
4[1] Tsotsos et al. “Modeling Visual Attention via Selective Tuning.” Artificial Intelligence, 1995.
Winner-Take-All [1]
Marginal Winning Probability (MWP): Equivalent to an Absorbing Markov
Chain process.
output layer
Probabilistic WTA
Excitation BackpropAssumptions:§ The response of the activation neuron is non-negative.§ An activation neuron is tuned to detect certain visual features. Its response is positively
correlated to its confidence of the detection.
5
ActivationLayer N
ActivationLayer N-1
+++_
Inhibitory Neuron
Excitatory Neuron
Excitation BackpropAssumptions:§ The response of the activation neuron is non-negative.§ An activation neuron is tuned to detect certain visual features. Its response is positively
correlated to its confidence of the detection.
6
A Common Issue: Insensitiveness to Top-down Signals
7
zebra elephant
Dominant neurons always win
Contrastive Attention
8
zebra elephant
elephant zebra
Negating the Output Layer for Contrastive Signals
9
zebraclassifier
non-zebraclassifier
zebra map non-zebra map
Thanks to our Excitation Backprop formulation:§ Contrastive attention map can be computed by a single pass§ The pair of maps are well normalized with our probabilistic framework§ The pair of maps are positive-valued
Evaluation: The Pointing Game§ Task:
› Given an image and an object category, point to the targets.§ Metric:
› Pointing accuracy. › Pointing anywhere on the targets is fine.
§ Dataset:› VOC07 (20 categories)› COCO (80 categories)
§ CNN Models:› CNN-S [Chatfield et al. BMVC’14]› VGG16 [Simonyan et al. ICLR’15]› GoogleNet [Szegedy et al. CVPR’15]
§ Model training:› Multi-label cross-entropy loss
10
credit: elena milevska / stock.adobe.com
Results
11
Mean Accuracy over Object Categories in the Pointing Game
Qualitative Comparison
12
Text-to-Region Association§ Visualizing the top-down attention of a CNN classifier for ~18K tags.
13