View
2
Download
0
Category
Preview:
Citation preview
Explaining rankings
Maartje ter HoeveUniversity of Amsterdam & Blendle
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
C o n t e n t
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
What and why?
RankingsResearch questions
Related workAnswering research questions
Discussion and Conclusion
Blendle
C o n t e n t
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
What and why?
RankingsResearch questions
Related workAnswering research questions
Discussion and conclusion
Blendle
I n t r o d u c t i o n
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
What is explainability and why is it needed?
E x p l a i n a b i l i t y : w h a t a n d w h y ?
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
Model
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
?
E x p l a i n a b i l i t y : w h a t a n d w h y ?
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
E x p l a i n a b i l i t y : w h a t a n d w h y ?
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
User Developer
E x p l a i n a b i l i t y : w h a t a n d w h y ?
B u t w h a t i s a n e x p l a n a t i o n ?
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
An explanation needs to faithfully give the underlying cause of an event
B u t w h a t i s a n e x p l a n a t i o n ?
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
Justification
Provide conceptual explanations that do not necessarily expose the underlying structure of the algorithm
Description
Provide conceptual explanations that do expose the underlying structure of the algorithm
C o n t e n t
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
What and why?
RankingsResearch questions
Related workAnswering research questions
Discussion and conclusion
Blendle
R a n k i n g s
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
1
2
3
n - 1
n
4
n - 2
H o w d o w e e x p l a i n a r a n k i n g ?
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
231
203
1
157
6
228
3
543
398
2
231
8
432
4
Only looking at the score of an item is not sufficient
C o n t e n t
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
What and why?
RankingsResearch questions
Related workAnswering research questions
Discussion and conclusion
Blendle
B l e n d l e
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
B l e n d l e
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
Blendle already has heuristic justifications
We use these as one of our baselines
C o n t e n t
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
What and why?
RankingsResearch questions
Related workAnswering research questions
Discussion and conclusion
Blendle
R e s e a r c h q u e s t i o n s
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
RQ2 What way of showing news recommendations reasons do users prefer: textual or visual reasons; a single reason or multiple
reasons; apparent or less apparent reasons?
PART 1
RQ1 Do users want to receive explanations of why particular news items are recommended to them?
R e s e a r c h q u e s t i o n s
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
RQ3 How do we provide users with easy to understand, uncluttered, listwise explanations?
RQ4 How do we build an explanation system that produces faithful, model-agnostic explanations for the outcome of a ranking
algorithm, yet is scalable so that it can run in real time?
RQ5 Does the reading behaviour of users who are provided with model-agnostic listwise explanations for a personalized ranked
selection of news articles differ from the reading behaviour of users who are provided with heuristic or pointwise explanations for a
personalized ranked selection of news articles?
PART 2
C o n t e n t
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
What and why?
RankingsResearch questions
Related workAnswering research questions
Discussion and conclusion
Blendle
R e l a t e d w o r k
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
LIME (Ribeiro et al, 2016): find a local, faithful explanation for the decision of any classifier
LIME is a baseline of this research
We work with rankings, not classifiers
Therefore we bin our ranking scores
mLIME
C o n t e n t
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
What and why?
RankingsResearch questions
Related workAnswering research questions
Discussion and conclusion
Blendle
P a r t 1 - U s e r s t u d y
Single Reason - Visible Single Reason - Invisible Multiple Reasons - Visible Multiple Reasons - Combined Bar chart
179 sent type 1180 sent type 2182 sent type 3
1 2 3
41 answered type 136 answered type 243 answered type 3
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
R Q 1 D o u s e r s w a n t e x p l a n a t i o n s ?
User wants reasons Times answered
Yes 65
Somewhat 24
No 26
I don't know 5
X2 = 14.55, p < 0.001
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
Single Reason - Visible Single Reason - Invisible Multiple Reasons - Visible Multiple Reasons - Combined Bar chart
R Q 2 P r e f e r e n c e s h o w e x p l a n a t i o n s a r e s h o w n ?
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
R Q 2 P r e f e r e n c e s h o w e x p l a n a t i o n s a r e s h o w n ?
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
Transparency
Sufficiency
Trust
Satisfaction
P a r t 2
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
R Q 3 H o w d o w e m a k e l i s t w i s e e x p l a n a t i o n s ?
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
231
203
1
157
6
228
3
543
398
2
231
8
432
4
Explain the entire list?
Explain items in comparison to other items?
Explain which features were important for the
position in the ranking?
R Q 3 H o w d o w e m a k e l i s t w i s e e x p l a n a t i o n s ?
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
231
203
1
157
6
228
3
543
398
2
231
8
432
4
Explain the entire list?
Explain items in comparison to other items?
Explain which features were important for the
position in the ranking?
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
Which features are most important for the item's position in the ranking?
R Q 3 H o w d o w e m a k e l i s t w i s e e x p l a n a t i o n s ?
R Q 4 H o w d o w e d e s i g n t h i s ?
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
Main intuition
If we change feature values and the ranking changes, then this feature was important
R Q 4 H o w d o w e d e s i g n t h i s ?
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
Training phase
Find how feature values change the ranking
Find disruptive distributions and points of interests
Use distributions to sample feature values from
Find most important features
Return most important features as explanations
Explaining phase LISTEN - LISTwise ExplaiNer
R Q 4 H o w d o w e d e s i g n t h i s ?
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
Training phase
R Q 4 H o w d o w e d e s i g n t h i s ?
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
200
160
120
100
180
Find how feature values change the ranking: f0 [1, 2, 3, 4, 5, 6]
R Q 4 H o w d o w e d e s i g n t h i s ?
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
200
160
120
100
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
1 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
180
200
90
120
100
180
Find how feature values change the ranking: f0 [1, 2, 3, 4, 5, 6]
R Q 4 H o w d o w e d e s i g n t h i s ?
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
200
160
120
100
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
2 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
180
200
110
120
100
180
Find how feature values change the ranking: f0 [1, 2, 3, 4, 5, 6]
R Q 4 H o w d o w e d e s i g n t h i s ?
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
200
160
120
100
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
3 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
180
200
160
120
100
180
Find how feature values change the ranking: f0 [1, 2, 3, 4, 5, 6]
R Q 4 H o w d o w e d e s i g n t h i s ?
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
200
160
120
100
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
4 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
180
200
170
120
100
180
Find how feature values change the ranking: f0 [1, 2, 3, 4, 5, 6]
R Q 4 H o w d o w e d e s i g n t h i s ?
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
200
160
120
100
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
5 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
180
200
190
120
100
180
Find how feature values change the ranking: f0 [1, 2, 3, 4, 5, 6]
R Q 4 H o w d o w e d e s i g n t h i s ?
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
200
160
120
100
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
6 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
180
200
195
120
100
180
Find how feature values change the ranking: f0 [1, 2, 3, 4, 5, 6]
R Q 4 H o w d o w e d e s i g n t h i s ?
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
Find disruptive distributions and points of interests
R Q 4 H o w d o w e d e s i g n t h i s ?
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
Explanation phase
R Q 4 H o w d o w e d e s i g n t h i s ?
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
Find most important features
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
200
160
120
100
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
180
170
160
120
100
180
R Q 4 H o w d o w e d e s i g n t h i s ?
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
Return most important features
R Q 4 H o w d o w e d e s i g n t h i s ?
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
Is this faithful?
Construct dummy data where we know what to expect
LISTEN gives the correct results
Disruptive distribution approach slightly decreases faithfulness, yet increases speed
R Q 4 H o w d o w e d e s i g n t h i s ?
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
This is still not fast enough to run in production
f0 f1 f2 …. fn-1 fn
f0 f1 f2 …. fn-1 fn
f0 f1 f2 …. fn-1 fn
f0 f1 f2 …. fn-1 fn
f0 f1 f2 …. fn-1 fn
f0 f1 f2 …. fn-1 fn
Q-LISTEN
R Q 4 H o w d o w e d e s i g n t h i s ?
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
Same holds for mLIME
Q-mLIME
f0
f1
.
.
fn
f0
f1
.
.
fn
R Q 5 H o w d o u s e r s b e h a v e ?
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
A. Heuristic
A/B/C - test
B. Q-mLIME C. Q-LISTEN
R Q 5 H o w d o u s e r s b e h a v e ?
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
R Q 5 H o w d o u s e r s b e h a v e ?
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
Results
R Q 5 H o w d o u s e r s b e h a v e ?
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
Results
Heuristic Q-mLIME Q-LISTEN
% reasons seen (of total reads) 3.55 3.54 3.51
% reasons seen (of total reasons) 34.1 32.9 32.0
Reasons per user (of users that see reasons) 1.83 1.86 1.72
Article opened within 2 minutes (% of all reasons seen)
12.1 12.9 10.6
Reasons seen after 20 minutes of opening (% of all reasons seen)
11.0 10.1 10.7
C o n t e n t
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
What and why?
RankingsResearch questions
Related workAnswering research questions
Discussion and conclusion
Blendle
C o n c l u s i o n a n d d i s c u s s i o n
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
RQ1 Do users want to receive explanations of why particular news items are recommended to them?
RQ2 What way of showing news recommendations reasons do users prefer: textual or visual reasons; a single reason or multiple
reasons; apparent or less apparent reasons?
PART 1
Yes
No specific one
C o n c l u s i o n a n d d i s c u s s i o n
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
RQ3 How do we provide users with easy to understand, uncluttered, listwise explanations?
RQ4 How do we build an explanation system that produces faithful, model-agnostic explanations for the outcome of a ranking
algorithm, yet is scalable so that it can run in real time?
PART 2
We look at which features were most important for an item's position in the ranking
We find feature importance by changing the feature values and changing the rankings
We make disruptive distributions
We learn the explanation space with a neural net
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
RQ5 Does the reading behaviour of users who are provided with model-agnostic listwise explanations for a personalized ranked
selection of news articles differ from the reading behaviour of users who are provided with heuristic or pointwise explanations for a
personalized ranked selection of news articles?
C o n c l u s i o n a n d d i s c u s s i o n
We do not find any significant differences
C o n c l u s i o n a n d d i s c u s s i o n
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
Take home message
Users clearly state that they want explanations. Therefore, even though their behaviour is not affected by the explanations,
still provide them with faithful explanations
R e a s o n s s e e n
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
L o s s a n d a c c u r a c y c u r v e s
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
m L I M E v s L I S T E N
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
A c t i v e a n d l e s s a c t i v e u s e r s
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
R Q 5 H o w d o u s e r s b e h a v e ?
Maartje ter Hoeve maartje.terhoeve@student.uva.nl
Recommended