Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Explaining rankings
Maartje ter HoeveUniversity of Amsterdam & Blendle
Maartje ter Hoeve [email protected]
C o n t e n t
Maartje ter Hoeve [email protected]
What and why?
RankingsResearch questions
Related workAnswering research questions
Discussion and Conclusion
Blendle
C o n t e n t
Maartje ter Hoeve [email protected]
What and why?
RankingsResearch questions
Related workAnswering research questions
Discussion and conclusion
Blendle
I n t r o d u c t i o n
Maartje ter Hoeve [email protected]
What is explainability and why is it needed?
Maartje ter Hoeve [email protected]
E x p l a i n a b i l i t y : w h a t a n d w h y ?
Maartje ter Hoeve [email protected]
User Developer
E x p l a i n a b i l i t y : w h a t a n d w h y ?
B u t w h a t i s a n e x p l a n a t i o n ?
Maartje ter Hoeve [email protected]
An explanation needs to faithfully give the underlying cause of an event
B u t w h a t i s a n e x p l a n a t i o n ?
Maartje ter Hoeve [email protected]
Justification
Provide conceptual explanations that do not necessarily expose the underlying structure of the algorithm
Description
Provide conceptual explanations that do expose the underlying structure of the algorithm
C o n t e n t
Maartje ter Hoeve [email protected]
What and why?
RankingsResearch questions
Related workAnswering research questions
Discussion and conclusion
Blendle
H o w d o w e e x p l a i n a r a n k i n g ?
Maartje ter Hoeve [email protected]
231
203
1
157
6
228
3
543
398
2
231
8
432
4
Only looking at the score of an item is not sufficient
C o n t e n t
Maartje ter Hoeve [email protected]
What and why?
RankingsResearch questions
Related workAnswering research questions
Discussion and conclusion
Blendle
B l e n d l e
Maartje ter Hoeve [email protected]
B l e n d l e
Maartje ter Hoeve [email protected]
Blendle already has heuristic justifications
We use these as one of our baselines
C o n t e n t
Maartje ter Hoeve [email protected]
What and why?
RankingsResearch questions
Related workAnswering research questions
Discussion and conclusion
Blendle
R e s e a r c h q u e s t i o n s
Maartje ter Hoeve [email protected]
RQ2 What way of showing news recommendations reasons do users prefer: textual or visual reasons; a single reason or multiple
reasons; apparent or less apparent reasons?
PART 1
RQ1 Do users want to receive explanations of why particular news items are recommended to them?
R e s e a r c h q u e s t i o n s
Maartje ter Hoeve [email protected]
RQ3 How do we provide users with easy to understand, uncluttered, listwise explanations?
RQ4 How do we build an explanation system that produces faithful, model-agnostic explanations for the outcome of a ranking
algorithm, yet is scalable so that it can run in real time?
RQ5 Does the reading behaviour of users who are provided with model-agnostic listwise explanations for a personalized ranked
selection of news articles differ from the reading behaviour of users who are provided with heuristic or pointwise explanations for a
personalized ranked selection of news articles?
PART 2
C o n t e n t
Maartje ter Hoeve [email protected]
What and why?
RankingsResearch questions
Related workAnswering research questions
Discussion and conclusion
Blendle
R e l a t e d w o r k
Maartje ter Hoeve [email protected]
LIME (Ribeiro et al, 2016): find a local, faithful explanation for the decision of any classifier
LIME is a baseline of this research
We work with rankings, not classifiers
Therefore we bin our ranking scores
mLIME
C o n t e n t
Maartje ter Hoeve [email protected]
What and why?
RankingsResearch questions
Related workAnswering research questions
Discussion and conclusion
Blendle
P a r t 1 - U s e r s t u d y
Single Reason - Visible Single Reason - Invisible Multiple Reasons - Visible Multiple Reasons - Combined Bar chart
179 sent type 1180 sent type 2182 sent type 3
1 2 3
41 answered type 136 answered type 243 answered type 3
Maartje ter Hoeve [email protected]
R Q 1 D o u s e r s w a n t e x p l a n a t i o n s ?
User wants reasons Times answered
Yes 65
Somewhat 24
No 26
I don't know 5
X2 = 14.55, p < 0.001
Maartje ter Hoeve [email protected]
Single Reason - Visible Single Reason - Invisible Multiple Reasons - Visible Multiple Reasons - Combined Bar chart
R Q 2 P r e f e r e n c e s h o w e x p l a n a t i o n s a r e s h o w n ?
Maartje ter Hoeve [email protected]
R Q 2 P r e f e r e n c e s h o w e x p l a n a t i o n s a r e s h o w n ?
Maartje ter Hoeve [email protected]
Transparency
Sufficiency
Trust
Satisfaction
P a r t 2
Maartje ter Hoeve [email protected]
R Q 3 H o w d o w e m a k e l i s t w i s e e x p l a n a t i o n s ?
Maartje ter Hoeve [email protected]
231
203
1
157
6
228
3
543
398
2
231
8
432
4
Explain the entire list?
Explain items in comparison to other items?
Explain which features were important for the
position in the ranking?
R Q 3 H o w d o w e m a k e l i s t w i s e e x p l a n a t i o n s ?
Maartje ter Hoeve [email protected]
231
203
1
157
6
228
3
543
398
2
231
8
432
4
Explain the entire list?
Explain items in comparison to other items?
Explain which features were important for the
position in the ranking?
Maartje ter Hoeve [email protected]
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
Which features are most important for the item's position in the ranking?
R Q 3 H o w d o w e m a k e l i s t w i s e e x p l a n a t i o n s ?
R Q 4 H o w d o w e d e s i g n t h i s ?
Maartje ter Hoeve [email protected]
Main intuition
If we change feature values and the ranking changes, then this feature was important
R Q 4 H o w d o w e d e s i g n t h i s ?
Maartje ter Hoeve [email protected]
Training phase
Find how feature values change the ranking
Find disruptive distributions and points of interests
Use distributions to sample feature values from
Find most important features
Return most important features as explanations
Explaining phase LISTEN - LISTwise ExplaiNer
R Q 4 H o w d o w e d e s i g n t h i s ?
Maartje ter Hoeve [email protected]
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
200
160
120
100
180
Find how feature values change the ranking: f0 [1, 2, 3, 4, 5, 6]
R Q 4 H o w d o w e d e s i g n t h i s ?
Maartje ter Hoeve [email protected]
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
200
160
120
100
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
1 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
180
200
90
120
100
180
Find how feature values change the ranking: f0 [1, 2, 3, 4, 5, 6]
R Q 4 H o w d o w e d e s i g n t h i s ?
Maartje ter Hoeve [email protected]
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
200
160
120
100
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
2 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
180
200
110
120
100
180
Find how feature values change the ranking: f0 [1, 2, 3, 4, 5, 6]
R Q 4 H o w d o w e d e s i g n t h i s ?
Maartje ter Hoeve [email protected]
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
200
160
120
100
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
3 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
180
200
160
120
100
180
Find how feature values change the ranking: f0 [1, 2, 3, 4, 5, 6]
R Q 4 H o w d o w e d e s i g n t h i s ?
Maartje ter Hoeve [email protected]
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
200
160
120
100
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
4 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
180
200
170
120
100
180
Find how feature values change the ranking: f0 [1, 2, 3, 4, 5, 6]
R Q 4 H o w d o w e d e s i g n t h i s ?
Maartje ter Hoeve [email protected]
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
200
160
120
100
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
5 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
180
200
190
120
100
180
Find how feature values change the ranking: f0 [1, 2, 3, 4, 5, 6]
R Q 4 H o w d o w e d e s i g n t h i s ?
Maartje ter Hoeve [email protected]
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
200
160
120
100
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
6 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
180
200
195
120
100
180
Find how feature values change the ranking: f0 [1, 2, 3, 4, 5, 6]
R Q 4 H o w d o w e d e s i g n t h i s ?
Maartje ter Hoeve [email protected]
Find disruptive distributions and points of interests
R Q 4 H o w d o w e d e s i g n t h i s ?
Maartje ter Hoeve [email protected]
Find most important features
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
200
160
120
100
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
180
170
160
120
100
180
R Q 4 H o w d o w e d e s i g n t h i s ?
Maartje ter Hoeve [email protected]
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
f0 f1 f2 f3 f4
Return most important features
R Q 4 H o w d o w e d e s i g n t h i s ?
Maartje ter Hoeve [email protected]
Is this faithful?
Construct dummy data where we know what to expect
LISTEN gives the correct results
Disruptive distribution approach slightly decreases faithfulness, yet increases speed
R Q 4 H o w d o w e d e s i g n t h i s ?
Maartje ter Hoeve [email protected]
This is still not fast enough to run in production
f0 f1 f2 …. fn-1 fn
f0 f1 f2 …. fn-1 fn
f0 f1 f2 …. fn-1 fn
f0 f1 f2 …. fn-1 fn
f0 f1 f2 …. fn-1 fn
f0 f1 f2 …. fn-1 fn
Q-LISTEN
R Q 4 H o w d o w e d e s i g n t h i s ?
Maartje ter Hoeve [email protected]
Same holds for mLIME
Q-mLIME
f0
f1
.
.
fn
f0
f1
.
.
fn
R Q 5 H o w d o u s e r s b e h a v e ?
Maartje ter Hoeve [email protected]
A. Heuristic
A/B/C - test
B. Q-mLIME C. Q-LISTEN
R Q 5 H o w d o u s e r s b e h a v e ?
Maartje ter Hoeve [email protected]
R Q 5 H o w d o u s e r s b e h a v e ?
Maartje ter Hoeve [email protected]
Results
Heuristic Q-mLIME Q-LISTEN
% reasons seen (of total reads) 3.55 3.54 3.51
% reasons seen (of total reasons) 34.1 32.9 32.0
Reasons per user (of users that see reasons) 1.83 1.86 1.72
Article opened within 2 minutes (% of all reasons seen)
12.1 12.9 10.6
Reasons seen after 20 minutes of opening (% of all reasons seen)
11.0 10.1 10.7
C o n t e n t
Maartje ter Hoeve [email protected]
What and why?
RankingsResearch questions
Related workAnswering research questions
Discussion and conclusion
Blendle
C o n c l u s i o n a n d d i s c u s s i o n
Maartje ter Hoeve [email protected]
RQ1 Do users want to receive explanations of why particular news items are recommended to them?
RQ2 What way of showing news recommendations reasons do users prefer: textual or visual reasons; a single reason or multiple
reasons; apparent or less apparent reasons?
PART 1
Yes
No specific one
C o n c l u s i o n a n d d i s c u s s i o n
Maartje ter Hoeve [email protected]
RQ3 How do we provide users with easy to understand, uncluttered, listwise explanations?
RQ4 How do we build an explanation system that produces faithful, model-agnostic explanations for the outcome of a ranking
algorithm, yet is scalable so that it can run in real time?
PART 2
We look at which features were most important for an item's position in the ranking
We find feature importance by changing the feature values and changing the rankings
We make disruptive distributions
We learn the explanation space with a neural net
Maartje ter Hoeve [email protected]
RQ5 Does the reading behaviour of users who are provided with model-agnostic listwise explanations for a personalized ranked
selection of news articles differ from the reading behaviour of users who are provided with heuristic or pointwise explanations for a
personalized ranked selection of news articles?
C o n c l u s i o n a n d d i s c u s s i o n
We do not find any significant differences
C o n c l u s i o n a n d d i s c u s s i o n
Maartje ter Hoeve [email protected]
Take home message
Users clearly state that they want explanations. Therefore, even though their behaviour is not affected by the explanations,
still provide them with faithful explanations
R e a s o n s s e e n
Maartje ter Hoeve [email protected]
L o s s a n d a c c u r a c y c u r v e s
Maartje ter Hoeve [email protected]
m L I M E v s L I S T E N
Maartje ter Hoeve [email protected]
A c t i v e a n d l e s s a c t i v e u s e r s
Maartje ter Hoeve [email protected]
R Q 5 H o w d o u s e r s b e h a v e ?
Maartje ter Hoeve [email protected]