Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Modeling Conceptual Understanding in Image Reference Games
Prof. Dr. Zeynep Akata
Interpretable ML Tutorial at CVPR 2020
15 June 2020
1
Outline
Background: Explanation and Learning Are Related
Modeling Conceptual Understanding With Image Reference Games
Conclusion: Explaining Through Communication Is Exciting
2
Outline
Background: Explanation and Learning Are Related
Modeling Conceptual Understanding With Image Reference Games
Conclusion: Explaining Through Communication Is Exciting
3
Learning via Explanation Lombrozo TICS’16
4
Learning via Explanation Lombrozo TICS’16
4
Learning via Explanation Lombrozo TICS’16
4
Learning via Explanation Lombrozo TICS’16
4
Learning via Explanation Lombrozo TICS’16
4
Attributes as Explanations Lampert et al. CVPR’09
attributes classes
Red crownOrange bellyBlack eyes
Blue crownWhite bellyBlack eyes
Eastern bluebird
[1 1 1 0 0]
Cardinal[0 0 1 1 1]
images
5
Attributes as Explanations Lampert et al. CVPR’09
attributes classes
Red crownOrange bellyBlack eyes
Blue crownWhite bellyBlack eyes
Eastern bluebird
[1 1 1 0 0]
Cardinal[0 0 1 1 1]
images
5
Attributes as Explanations Lampert et al. CVPR’09
attributes classes
Red crownOrange bellyBlack eyes
Blue crownWhite bellyBlack eyes
Eastern bluebird
[1 1 1 0 0]
Cardinal[0 0 1 1 1]
images
5
Natural Language as Explanations for Communication
6
Natural Language as Explanations for Communication
What type of bird is this?
6
Natural Language as Explanations for Communication
What type of bird is this?
It is a Cardinal
What type of bird is this?
It is a Cardinal because it is a red bird with a red beak and a black face
Why not a Vermilion
Flycatcher?
It is not a Vermilion Flycatcher because it does not have black wings.
6
Natural Language as Explanations for Communication
What type of bird is this?
It is a Cardinal because it is a red bird with a red beak and a black face
6
Natural Language as Explanations for Communication
What type of bird is this?
It is a Cardinal because it is a red bird with a red beak and a black face
Why not a Vermilion
Flycatcher?
6
Natural Language as Explanations for Communication
What type of bird is this?
It is a Cardinal because it is a red bird with a red beak and a black face
Why not a Vermilion
Flycatcher?
It is not a Vermilion Flycatcher because it does not have black wings.
6
Grounding Visual Explanations Hendricks et al. ECCV’16 & ECCV’18
Explanation Sampler
This red bird has a red beak and a black face.
7
Grounding Visual Explanations Hendricks et al. ECCV’16 & ECCV’18
Explanation Sampler
attribute chunker
This red bird has a red beak and a black face.
Explanation Grounder
red birdred beak
black face
7
Grounding Visual Explanations Hendricks et al. ECCV’16 & ECCV’18
Explanation Sampler
attribute chunker
attribute chunker
This red bird has a red beak and a black face.
This red bird has a black beak and a black face.
Explanation Grounder
red birdblack beak
black face
red birdred beak
black face
7
Grounding Visual Explanations Hendricks et al. ECCV’16 & ECCV’18
Explanation Sampler
1.02
attribute chunker
2.05
attribute chunker
This red bird has a red beak and a black face.
This red bird has a black beak and a black face.
Phrase-CriticExplanation Grounder
red beak black
facered bird
black face
red birdblack beak
red birdblack beak
black face
red birdred beak
black face
7
Rational Quantitative Attribution of Beliefs, Desires and Percepts inHuman Mentalizing Baker et al. Nature’17
8
Rational Quantitative Attribution of Beliefs, Desires and Percepts inHuman Mentalizing Baker et al. Nature’17
8
Rational Quantitative Attribution of Beliefs, Desires and Percepts inHuman Mentalizing Baker et al. Nature’17
8
Rational Quantitative Attribution of Beliefs, Desires and Percepts inHuman Mentalizing Baker et al. Nature’17
8
Machine Theory of Mind Rabinowitz et al. ICML’18
9
Machine Theory of Mind Rabinowitz et al. ICML’18
9
Machine Theory of Mind Rabinowitz et al. ICML’18
9
Machine Theory of Mind Rabinowitz et al. ICML’18
9
M3RL: Mind-aware Multi-agent Management ReinforcementLearning Shu et al. ICLR’19
10
M3RL: Mind-aware Multi-agent Management ReinforcementLearning Shu et al. ICLR’19
10
M3RL: Mind-aware Multi-agent Management ReinforcementLearning Shu et al. ICLR’19
10
Outline
Background: Explanation and Learning Are Related
Modeling Conceptual Understanding With Image Reference Games
Conclusion: Explaining Through Communication Is Exciting
11
Image Reference Games with Failure in Concept Understanding
Round
Red ???
12
Image Reference Games with Failure in Concept Understanding
Round
Red ???
12
Image Reference Games with Failure in Concept Understanding
RoundRed ???
12
Modeling Conceptual Understanding Corona, Alaniz, Akata NeurIPS’19
Speaker Listener (color-blind)
Red beak
Speaker Listener (color-blind)
-
Reward
“It’s image ”
+1 -1Agent
Embedding
= Cone beak
Cone beak
Yellow feet
Red beak
Red beak
Cone beak
Yellow feetCone beakRed beak
Yellow feetCone beak
Red beak-
Red beakYellow feet
Cone beak
Yellow feet
Yellow feet
Cone beak
or
13
Modeling Conceptual Understanding Corona, Alaniz, Akata NeurIPS’19
Speaker Listener (color-blind)
Red beak
Speaker Listener (color-blind)
-
Reward
“It’s image ”
+1 -1Agent
Embedding
= Cone beak
Red beak
Cone beak
Yellow feetRed beak
Cone beak
Cone beak
Yellow feetRed beak
Cone beakRed beak
Yellow feet
Cone beak
Yellow feet
Yellow feet
Red beak
Red beak
Cone beak
Yellow feetCone beakRed beak
Yellow feetCone beak
Red beak-
Red beakYellow feet
Cone beak
Yellow feet
Yellow feet
Cone beak
or
13
Modeling Conceptual Understanding Corona, Alaniz, Akata NeurIPS’19
Speaker Listener (color-blind)
Red beak
Speaker Listener (color-blind)
-
Reward
“It’s image ”
+1 -1Agent
Embedding
= Cone beak
Red beak
Cone beak
Yellow feetRed beak
Cone beak
Cone beak
Yellow feetRed beak
Cone beakRed beak
Yellow feet
Cone beak
Yellow feet
Yellow feet
Red beak
Red beak
Cone beak
Yellow feetCone beakRed beak
Yellow feetCone beak
Red beak-
Red beakYellow feet
Cone beak
Yellow feet
Yellow feet
Cone beak
or
-
Agent Embedding
= Cone beak Red beak
Cone beak
Yellow feet
Red beak-Yellow feet
Cone beak
13
Modeling Conceptual Understanding Corona, Alaniz, Akata NeurIPS’19
Speaker Listener (color-blind)
Red beak
Speaker Listener (color-blind)
-
Reward
“It’s image ”
+1 -1Agent
Embedding
= Cone beak
Red beak
Cone beak
Yellow feetRed beak
Cone beak
Cone beak
Yellow feetRed beak
Cone beakRed beak
Yellow feet
Cone beak
Yellow feet
Yellow feet
Red beak
Red beak
Cone beak
Yellow feetCone beakRed beak
Yellow feetCone beak
Red beak-
Red beakYellow feet
Cone beak
Yellow feet
Yellow feet
Cone beak
or
-
Agent Embedding
Reward
“It’s image ”
+1 -1
= Cone beak Red beak
Cone beak
Yellow feet
Red beak-Yellow feet
Cone beak
13
Perceptual Modules (PM)
Speaker Listener (color-blind)
Red beak
Speaker Listener (color-blind)
-
Reward
“It’s image ”
+1 -1Agent
Embedding
= Cone beak
Red beak
Cone beak
Yellow feetRed beak
Cone beak
Cone beak
Yellow feetRed beak
Cone beakRed beak
Yellow feet
Cone beak
Yellow feet
Yellow feet
Red beak
Red beak
Cone beak
Yellow feetCone beakRed beak
Yellow feetCone beak
Red beak-
Red beakYellow feet
Cone beak
Yellow feet
Yellow feet
Cone beak
or
-
Agent Embedding
Reward
“It’s image ”
+1 -1
= Cone beak Red beak
Cone beak
Yellow feet
Red beak-Yellow feet
Cone beak
1. Extract image-level features using a CNN
2. Predict attribute-level features
φ(x) = f(CNN(x))
• each element in φ(x) ∈ [0, 1]|A|
represents a separate attribute
• |A|: # of visual attribute labels
• Speaker is one: φS ,Multiple listeners: φL
attributes classes
Red crownOrange bellyBlack eyes
Blue crownWhite bellyBlack eyes
Eastern bluebird
[1 1 1 0 0]
Cardinal[0 0 1 1 1]
images
14
Perceptual Modules (PM)
Speaker Listener (color-blind)
Red beak
Speaker Listener (color-blind)
-
Reward
“It’s image ”
+1 -1Agent
Embedding
= Cone beak
Red beak
Cone beak
Yellow feetRed beak
Cone beak
Cone beak
Yellow feetRed beak
Cone beakRed beak
Yellow feet
Cone beak
Yellow feet
Yellow feet
Red beak
Red beak
Cone beak
Yellow feetCone beakRed beak
Yellow feetCone beak
Red beak-
Red beakYellow feet
Cone beak
Yellow feet
Yellow feet
Cone beak
or
-
Agent Embedding
Reward
“It’s image ”
+1 -1
= Cone beak Red beak
Cone beak
Yellow feet
Red beak-Yellow feet
Cone beak
1. Extract image-level features using a CNN
2. Predict attribute-level features
φ(x) = f(CNN(x))
• each element in φ(x) ∈ [0, 1]|A|
represents a separate attribute
• |A|: # of visual attribute labels
• Speaker is one: φS ,Multiple listeners: φL
attributes classes
Red crownOrange bellyBlack eyes
Blue crownWhite bellyBlack eyes
Eastern bluebird
[1 1 1 0 0]
Cardinal[0 0 1 1 1]
images
14
Agent Embedding (AE)
Speaker: Select attribute ak from
zas = φaS(xkt )− φaS(xkc ).
Listener: Select attribute ak from
zal = φaL(xkt )− φaL(xkc ).
receives reward rk ∈ {−1, 1}
Speaker Listener (color-blind)
Red beak
Speaker Listener (color-blind)
-
Reward
“It’s image ”
+1 -1Agent
Embedding
= Cone beak
Red beak
Cone beak
Yellow feetRed beak
Cone beak
Cone beak
Yellow feetRed beak
Cone beakRed beak
Yellow feet
Cone beak
Yellow feet
Yellow feet
Red beak
Red beak
Cone beak
Yellow feetCone beakRed beak
Yellow feetCone beak
Red beak-
Red beakYellow feet
Cone beak
Yellow feet
Yellow feet
Cone beak
or
-
Agent Embedding
Reward
“It’s image ”
+1 -1
= Cone beak Red beak
Cone beak
Yellow feet
Red beak-Yellow feet
Cone beak
AE module: LSTM, AE hk: LSTM hidden state
hk = LSTM(hk−1, ok)
ok: One-hot vector, the index of the non-zero entryis ak and its value is rk.
15
Agent Embedding (AE)
Speaker: Select attribute ak from
zas = φaS(xkt )− φaS(xkc ).
Listener: Select attribute ak from
zal = φaL(xkt )− φaL(xkc ).
receives reward rk ∈ {−1, 1}
Speaker Listener (color-blind)
Red beak
Speaker Listener (color-blind)
-
Reward
“It’s image ”
+1 -1Agent
Embedding
= Cone beak
Red beak
Cone beak
Yellow feetRed beak
Cone beak
Cone beak
Yellow feetRed beak
Cone beakRed beak
Yellow feet
Cone beak
Yellow feet
Yellow feet
Red beak
Red beak
Cone beak
Yellow feetCone beakRed beak
Yellow feetCone beak
Red beak-
Red beakYellow feet
Cone beak
Yellow feet
Yellow feet
Cone beak
or
-
Agent Embedding
Reward
“It’s image ”
+1 -1
= Cone beak Red beak
Cone beak
Yellow feet
Red beak-Yellow feet
Cone beak
AE module: LSTM, AE hk: LSTM hidden state
hk = LSTM(hk−1, ok)
ok: One-hot vector, the index of the non-zero entryis ak and its value is rk.
15
Policy Learning
Speaker Listener (color-blind)
Red beak
Speaker Listener (color-blind)
-
Reward
“It’s image ”
+1 -1Agent
Embedding
= Cone beak
Red beak
Cone beak
Yellow feetRed beak
Cone beak
Cone beak
Yellow feetRed beak
Cone beakRed beak
Yellow feet
Cone beak
Yellow feet
Yellow feet
Red beak
Red beak
Cone beak
Yellow feetCone beakRed beak
Yellow feetCone beak
Red beak-
Red beakYellow feet
Cone beak
Yellow feet
Yellow feet
Cone beak
or
-
Agent Embedding
Reward
“It’s image ”
+1 -1
= Cone beak Red beak
Cone beak
Yellow feet
Red beak-Yellow feet
Cone beak
Concatenate image-pair difference and AE
sk =[φ(xkt )− φ(xkc );hk
]Predict V (sk, ak) of using ak to describe xkt :
LV =1
N +M
∑N+M
MSE(V (sk, ak), rk)
16
Policy Learning: Different Policies Implemented Here
1. Epsilon Greedy Policy: Randomly sample ak with prob. ε or greedily choose ak
ak = argmaxa∈A
V (sk, a)
2. Active Policy: Train using policy gradient
La =1
N
∑N
−R log πS(st, at) with R = − 1
M
∑M
MSE(V (sk, ak), rk) (1)
3. Random Agent policy: Always select ak at random4. Reactive policy: Select ak at random, if rk = −1 sample a different ak5. Random Sampling: Select ak at random during N + greedy during M
17
Comparing Learned Policies vs Baselines
18
Comparing Learned Policies vs Baselines
18
Comparing Learned Policies vs Baselines
18
Comparing Learned Policies vs Baselines
18
Showing Necessity of Agent Embeddings
19
Evaluating Cluster Quality
1. Generate AE in 50K episodes
2. Perform K-Means on AE: C ′ (GT = C)
3. Evaluate: variation of information (VI)
V I(C,C ′) = H(C) +H(C ′)− 2I(C,C ′)
where H: entropy, I: mutual information
• V I measures amount of informationneeded to switch from C to C ′
20
Modeling Conceptual Understanding Qualitative Results
21
Modeling Conceptual Understanding Qualitative Results
21
Modeling Conceptual Understanding Qualitative Results
21
Outline
Background: Explanation and Learning Are Related
Modeling Conceptual Understanding With Image Reference Games
Conclusion: Explaining Through Communication Is Exciting
22
Conclusions
Modeling conceptual understanding is necessary to succeed in some tasks
1. Formulation for modeling other agents’ understanding
2. Allows XAI systems to tailor their explanations to the specific users
3. Learned AEs recovers a clustering over other agents’ conceptual understanding
Modeling Conceptual Understanding in Image Reference GamesRodolfo Corona, Stephan Alaniz and Zeynep Akata
published at NeurIPS 2019
23
Baker, C. L., Jara-Ettinger, J., Saxe, R., and Tenenbaum, J. B. (2017).
Rational quantitative attribution of beliefs, desires and percepts in human mentalizing.
Nature Human Behaviour, 1.
Christoph H. Lampert, H. N. and Harmeling, S. (2014).
Attribute-based classification for zero-shot visual object categorization.
IEEE TPAMI.
Corona, R., Alaniz, S., and Akata, Z. (2019).
Modeling conceptual understanding in image reference games.
In Neural Information Processing Systems (NeurIPS).
Hendricks, L.-A., Akata, Z., Rohrbach, M., Donahue, J., Schiele, B., and Darrell, T. (2016).
Generating visual explanations.
In European Conference of Computer Vision (ECCV).
Hendricks, L. A., Hu, R., Darrell, T., and Akata, Z. (2018).
Grounding visual explanations.
In European Conference of Computer Vision (ECCV).
24
Lombrozo, T. (2016).
Explanatory preferences shape learning and inference.
In Trends in Cognitive Science.
Rabinowitz, N., Perbet, F., Song, F., Zhang, C., Eslami, S. M. A., and Botvinick, M. (2018).
Machine theory of mind.
In ICML.
Reed, S., Akata, Z., Lee, H., and Schiele, B. (2016).
Learning deep representations of fine-grained visual descriptions.
In IEEE Computer Vision and Pattern Recognition (CVPR).
Shu, T. and Tian, Y. (2019).
M3RL: Mind-aware multi-agent management reinforcement learning.
In ICLR.
Thank you!
25