Upload
emmanuel-garrett
View
41
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Hierarchical Bayesian Models for Aggregating Retrieved Memories across Individuals. Mark Steyvers Department of Cognitive Sciences University of California, Irvine. Joint work with: Michael Lee Brent Miller Pernille Hemmer Bill Batchelder Paolo Napoletano. Ordering problem:. - PowerPoint PPT Presentation
Citation preview
Hierarchical Bayesian Models for Aggregating Retrieved Memories across Individuals
Mark Steyvers
Department of Cognitive Sciences
University of California, Irvine
Joint work with:Michael LeeBrent Miller
Pernille HemmerBill Batchelder
Paolo Napoletano
Thomas Jefferson
Andrew Jackson
James Monroe
George Washington
John AdamsGeorge Washington
Ordering problem:
time
what is the correct order of these Presidents?
Goal: aggregating responses
3
D A B C A B D C B A D C A C B D A D B C
Aggregation Algorithm
A B C D A B C D
ground truth
=?
group answer
Bayesian Approach
4
D A B C A B D C B A D C A C B D A D B C
Generative Model
A B C D
ground truth =latent common cause
Important notes:
No communication between individuals
There is always a true answer (ground truth)
Aggregation algorithm never has access to ground truth ground truth only used for evaluation
5
Matching problem:
6
RembrandtVan Gogh Monet Renoir
A BC D
Wisdom of crowds phenomenon
Crowd estimate is often better than any individual in the crowd
(Think of independent noise influencing each individual)
7
Examples of wisdom of crowds phenomenon
8
Who wants to be a millionaire?Galton’s Ox (1907): Median of individual estimates comes close to true answer
Limitations of Current “Wisdom of Crowds” Research
Studies restricted to numeric or categorical judgments simple averaging schemes:
Mode Median Mean
No treatment of individual differences every “vote” is treated equally downplayed role of expertise
9
Cultural Consensus Theory (CCT)E.g. Romney, Batchelder, and Weller (1987)
Finds the “answer key” to multiple choice questions when ground truth is lost takes person and item differences into account
Informal version of CCT also developed for ranking data
10
Research Goals
Generalize “wisdom of crowds” effect to more complex data
Aggregation of permutations Ranking data Matching (assignment) data
11
Hierarchical Bayesian Models
Probability distributions over all permutations of items with N items, there are N! combinations e.g., when N=44, we have 44! > 10^53 combinations Approximate inference methods: MCMC
Cognitively plausible generative processes
Treatment of individual differences
12
Part IOrdering Problems
13
Experiment 1
Task: order all 44 US presidents
Methods 26 participants (college undergraduates) Names of presidents written on cards Cards could be shuffled on large table
14
= 1= 1+1Measuring performance
Kendall’s Tau: The number of adjacent pair-wise swaps
Participant Ordering1 2 5 3 4
Ground Truth1 2 3 4 5
3 451 2
1 2 5 3 4
1 2 3 4 5= 2
Empirical Results
16
1 10 200
100
200
300
400
500
Individuals (ordered from best to worst)
(random guessing)
Probabilistic models Thurstone (1927) Mallows (1957) Plackett-Luce (1975) Lebanon-Mao (2008)
Spectral methods Diaconis (1989)
Heuristic methods from voting theory Borda count
… however, many of these approached developed for preference rankings
Many approaches for analyzing rank data…
17
Bayesian Thurstonian Approach
18
Each item has a true coordinate on some dimension
A B C
Bayesian Thurstonian Approach
19
A B C
… but there is noise because of encoding and/or retrieval error
Person 1
Bayesian Thurstonian Approach
20
Each person’s mental representation is based on (latent) samples of these distributions
B C
A B C
Person 1
A
Bayesian Thurstonian Approach
21
B C
A B C
The observed ordering is based on the ordering of the samples
A < B < C
Observed Ordering:
Person 1
A
Bayesian Thurstonian Approach
22
People draw from distributions with common mean but different variances
Person 1
B C
A B CA < B < C
Observed Ordering:
Person 2
A B C
BC
Observed Ordering:
A < C < BA
A
Graphical Model Notation
23
jx
1x
2x 3xj=1..3
shaded = observednot shaded = latent
Graphical Model of Bayesian Thurstonian Model
24
j individuals
jx
jy
μ
j
| , ~ N ,ij j jx
( )j jranky x
~ Gamma ,1 /j
Latent ground truth
Individual ability
Mental representation
Observed ordering
Inference
Need the posterior distribution
Markov Chain Monte Carlo Gibbs sampling on Metropolis-hastings on and
Draw 400 samples group ordering based on average of across samples
25
jxμ j
μ
, , | jp μ σ x y
1 10 200
50
100
150
200
250
300
350
Individuals
Thurstonian ModelIndividuals
Wisdom of Crowds effect
26
model’s ordering is as good as best individual
Inferred Distributions for 44 US Presidents
27
George Washington (1)John Adams (2)
Thomas Jefferson (3)James Madison (4)James Monroe (6)
John Quincy Adams (5)Andrew Jackson (7)
Martin Van Buren (8)William Henry Harrison (21)
John Tyler (10)James Knox Polk (18)
Zachary Taylor (16)Millard Fillmore (11)Franklin Pierce (19)
James Buchanan (13)Abraham Lincoln (9)
Andrew Johnson (12)Ulysses S. Grant (17)
Rutherford B. Hayes (20)James Garfield (22)Chester Arthur (15)
Grover Cleveland 1 (23)Benjamin Harrison (14)
Grover Cleveland 2 (25)William McKinley (24)
Theodore Roosevelt (29)William Howard Taft (27)
Woodrow Wilson (30)Warren Harding (26)Calvin Coolidge (28)Herbert Hoover (31)
Franklin D. Roosevelt (32)Harry S. Truman (33)
Dwight Eisenhower (34)John F. Kennedy (37)
Lyndon B. Johnson (36)Richard Nixon (39)
Gerald Ford (35)James Carter (38)
Ronald Reagan (40)George H.W. Bush (41)
William Clinton (42)George W. Bush (43)
Barack Obama (44)
median and minimumsigma
Model is calibrated
28
0 0.1 0.2 0.3 0.450
100
150
200
250
300
R=0.941
Individuals with large sigma are far from the truth
Alternative Models
Many heuristic methods from voting theory E.g., Borda count method
Suppose we have 10 items assign a count of 10 to first item, 9 for second item, etc add counts over individuals order items by the Borda count
i.e., rank by average rank across people
29
1 10 200
50
100
150
200
250
300
350
Individuals
Thurstonian ModelBorda countIndividuals
Model Comparison
30
Experiment 2
78 participants 17 problems each with 10 items
Chronological Events Physical Measures Purely ordinal problems, e.g.
Ten Amendments Ten commandments
31
Ordering states west-east
32
Oregon (1)
Utah (2)
Nebraska (3)
Iowa (4)
Alabama (6)
Ohio (5)
Virginia (7)
Delaware (8)
Connecticut (9)
Maine (10)
0 1 2 3
0
5
10
15
20
25
30
35
40
45
R=0.961
Ordering Ten Amendments
33
Freedom of speech & religion (1)
Right to bear arms (2)
No quartering of soldiers (4)
No unreasonable searches (3)
Due process (5)
Trial by Jury (6)
Civil Trial by Jury (7)
No cruel punishment (8)
Right to non-specified rights (10)
Power for the States & People (9)
ten ammendments
0 0.5 1 1.5 2 2.50
5
10
15
20
25
30
35
R=0.889
Ordering Ten Commandments
34
Worship any other God (1)
Make a graven image (7)
Take the Lord's name in vain (2)
Break the Sabbath (3)
Dishonor your parents (4)
Murder (6)
Commit adultery (8)
Steal (5)
Bear false witness (9)
Covet (10)
0 0.5 1 1.5 20
5
10
15
20
25
30
35
R=0.722
Average results over 17 Problems
35
1 10 20 30 40 50 60 70 800
5
10
15
20
25
Individuals
Me
an
Thurstonian ModelBorda countModeIndividuals
Effect of Group Composition
How many individuals do we need to average over?
36
Effect of Group Size: random groups
37
0 10 20 30 40 50 60 70 807
8
9
10
11
12
13
14
Group Size
T=0T=2
T=12
Experts vs. Crowds
Can we find experts in the crowd? Can we form small groups of experts?
Approach Form a group for some particular task Select individuals with the smallest sigma (“experts”) based on
previous tasks Vary the number of previous tasks
38
Group Composition based on prior performance
39
0 10 20 30 40 50 60 70 807
8
9
10
11
12
13
14
Group Size
T=0T=2
T=12
T = 0
# previous tasks
T = 2T = 8
Group size (best individuals first)
Methods for Selecting Experts
40
Endogenous: no feedback
required
Exogenous: selecting people based on
actual performance
0 10 20 30 407
8
9
10
11
12
13
14
0 20 407
8
9
10
11
12
13
14
Model incorporating overall person ability
41
j individuals
jmx
jmy
mμ
jm
| , ~ N ,ijm m jm m jmx
( )jm jmranky x
~ Gamma ,1 /jm j j
Overall ability
Task specific ability
m tasks
j ~ Gamma ,1 /j j individuals
1 10 20 30 40 50 60 70 800
5
10
15
20
25
Individuals
Mea
n
Thurstonian Model v1Thurstonian Model v2Borda countModeIndividuals
Average results over 17 Problems
42
Me
an
new model
Part IIOrdering Problems in Episodic Memory
43
Another ordering problem:
44
http://www.youtube.com/watch?v=29VGZtnCD30&feature=related
A
B
C
D
time
Experiment 3
26 participants
6 videos 3 videos with stereotyped event sequences (e.g. wedding) 3 videos “unpredictable” videos (e.g., example video) extracted 10 stills for testing
Method study video followed by immediate ordering test of 10 items
45
Bayesian Thurstonian Model
46
event1 (1)
event2 (2)
event3 (3)
event4 (4)
event5 (7)
event6 (6)
event7 (5)
event8 (8)
event9 (9)
event10 (10)
yogurt commercial
0 0.5 1 1.5 2
0
5
10
15
20
R=0.890
= 3
Two other examples
47
event1 (1)
event2 (2)
event3 (3)
event4 (4)
event5 (6)
event6 (5)
event7 (7)
event8 (8)
event9 (9)
event10 (10)
clay animation
= 1 event1 (1)
event2 (2)
event3 (3)
event4 (4)
event5 (5)
event6 (6)
event7 (7)
event8 (8)
event9 (9)
event10 (10)
wedding
= 0
Overall Results
48
1 10 20 300
5
10
15
Individuals
Thurstonian ModelBorda countModeIndividuals
Me
an
Part IIIMatching Problems
49
Example Matching Problem (one-to-one)
50
Dutch
Danish
Yiddish
Thai
Vietnamese
Chinese
Georgian
Russian
Japanese
A
B
C
D
E
F
G
H
I
godt nytår
gelukkig nieuwjaar
a gut yohr
С Новым Годом
สวั�สดี�ปี�ใหม่�
Chúc Mừng Nǎm Mới
გილოცავთ ახალწელს
Experiment
17 Participants
8 matching problems, e.g. car logo’s and brand names first and last names philosophers flags and countries greek symbols and letter names
Number of items varied between 10 and 24 with 24 items, we have 24! possibilities
51
Overall Results
52
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 170
0.2
0.4
0.6
0.8
1
Individuals
Mea
n A
ccur
acy
Heuristic Aggregation Approach
Combinatorial optimization problem maximizes agreement in assigning N items to N responses
Hungarian algorithm construct a count matrix M Mij = number of people that paired item i with response j find row and column permutations to maximize diagonal sum O( n3 )
53
Hungarian Algorithm Example
54= correct
Dutch
Danish
Fren
ch
Japa
nese
Span
ish
Arabic
Chinese
German
Italia
n
Russian
Thai
Vietnam
ese
Wels
h
Georg
ian
Yiddis
h
gelukkig Nieuwjaar 7 3 0 0 0 1 0 0 0 0 0 0 2 0 2
godt nytår 2 3 0 0 0 0 0 2 0 2 0 0 1 3 2
bonne année 0 0 14 0 1 0 0 0 0 0 0 0 0 0 0
0 0 0 9 0 0 2 0 1 0 3 0 0 0 0
feliz año nuevo 0 0 0 0 14 0 0 0 0 0 1 0 0 0 0عامسعيد 0 1 0 0 0 14 0 0 0 0 0 0 0 0 0
0 0 0 2 0 0 12 0 0 0 0 1 0 0 0
ein gutes neues Jahr 3 1 0 0 0 0 0 9 0 0 0 0 1 0 1
felice anno nuovo 0 0 0 0 0 0 0 0 14 1 0 0 0 0 0
С Новым Годом 0 0 1 0 0 0 0 0 0 11 0 0 1 2 0
สวั�สดี�ปี�ใหม่ � 0 0 0 1 0 0 1 0 0 0 7 1 1 4 0Chúc Mừng Nǎm Mới 0 0 0 0 0 0 0 0 0 1 0 11 1 2 0
Blwyddyn Newydd Dda 0 4 0 1 0 0 0 0 0 0 1 0 6 1 2
გილოცავთ ახალ წელს 0 0 0 2 0 0 0 1 0 0 3 2 0 1 6
a gut yohr 3 3 0 0 0 0 0 3 0 0 0 0 2 2 2
= incorrect
Hungarian Algorithm Results (2)
55
Leonardo da Vinci 9 5 1 0 0 0 0 0 0 0Jan Vermeer 0 7 3 0 2 2 0 0 1 0
Rembrandt van Rijn 1 0 4 3 1 1 1 1 3 0Pablo Picasso 0 0 1 6 6 0 0 1 1 0
Vincent van Gogh 0 0 0 1 6 4 0 1 2 1Renoir 1 0 1 0 0 3 7 2 1 0Monet 0 1 1 0 0 2 5 3 0 3
Jan Van Eyck 0 2 3 2 0 0 0 5 3 0Edvard Munch 0 0 1 0 0 3 0 2 4 5
Salvador Dali 4 0 0 3 0 0 2 0 0 6
Bayesian Matching Model
56
Proposed process:
- match “known” items- guess between remaining ones
Individual differences:
-some items easier to know-some participants know more
Dutch
Danish
Yiddish
Russian
godt nytår
gelukkig nieuwjaar
a gut yohr
С Новым Годом
Graphical Model
57
i items
jx
jy
z
ja
Latent ground truth
Observed matching
Knowledge State
jsProb. of knowing
id
j individuals
logitj i js d a
~ Bernoulliij ijx s
1 1( )
1 / ! 0ij
ij ij ij
xp y z
n x
person abilityitem easiness
Overall Modeling Results
58
1 10 200
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Individuals
Mea
n A
ccur
acy
Bayesian MatchingHungarian AlgorithmIndividuals
Calibration at level of items and people(for paintings problem)
59
0 0.5 10
0.2
0.4
0.6
0.8
1
D (inferred)
D (
act
ual)
Greek symbols
R=0.953
0 0.5 10
0.2
0.4
0.6
0.8
1
D (inferred)
D (
act
ual)
Philosophers
R=0.978
0 0.5 10
0.2
0.4
0.6
0.8
1
D (inferred)
D (
act
ual)
Flags
R=0.973
0 0.5 10
0.2
0.4
0.6
0.8
1
D (inferred)
D (
act
ual)
Paintings
R=0.916
0 0.5 10
0.2
0.4
0.6
0.8
1
D (inferred)
D (
act
ual)
US presidents
R=0.960
0 0.5 10
0.2
0.4
0.6
0.8
1
D (inferred)
D (
act
ual)
Car logos
R=0.918
0 0.5 10
0.2
0.4
0.6
0.8
1
D (inferred)
D (
act
ual)
Languages
R=0.947
0 0.5 10
0.2
0.4
0.6
0.8
1
D (inferred)
D (
act
ual)
Sport balls
R=0.963
0 0.5 10
0.2
0.4
0.6
0.8
1
A (inferred)
A (
act
ual)
Greek symbols
R=0.990
0 0.5 10
0.2
0.4
0.6
0.8
1
A (inferred)
A (
act
ual)
Philosophers
R=0.992
0 0.5 10
0.2
0.4
0.6
0.8
1
A (inferred)
A (
act
ual)
Flags
R=0.987
0 0.5 10
0.2
0.4
0.6
0.8
1
A (inferred)
A (
act
ual)
Paintings
R=0.975
0 0.5 10
0.2
0.4
0.6
0.8
1
A (inferred)
A (
act
ual)
US presidents
R=0.992
0 0.5 10
0.2
0.4
0.6
0.8
1
A (inferred)
A (
act
ual)
Car logos
R=0.992
0 0.5 10
0.2
0.4
0.6
0.8
1
A (inferred)
A (
act
ual)
Languages
R=0.968
0 0.5 10
0.2
0.4
0.6
0.8
1
A (inferred)
A (
act
ual)
Sport balls
R=0.995
ITEMS INDIVIDUALS
How predictive are subject provided confidence ratings?
60
0 1-2 3-4 5+0
0.2
0.4
0.6
0.8
1
0 1-2 3-4 5+0
0.2
0.4
0.6
0.8
1
# guesses estimatedby individual
Acc
urac
y
# guesses estimatedby model
(based on variable A)
r=-.42 r=-.77
Part IVOpen Issues
61
When do we get wisdom of crowds effect?
Independent errors different people knowing different things
Population response centered around ground truth
Some minimal number of individuals 10-20 individuals often sufficient
62
What are methods for finding experts?
1) Self-reported expertise: unreliable has led to claims of “myth of expertise”
2) Based on explicit scores by comparing to ground truth but ground truth might not be immediately available
3) Endogenously discover experts Use the crowd to discover experts Small groups of experts can be effective
63
What to do about systematic biases?
In some tasks, individuals systematically distort the ground truth spatial and temporal distortions memory distortions (e.g. false memory) decision-making distortions
Does this diminish the wisdom of crowds effect? maybe… but a model that predicts these systematic distortions might be
able to “undo” them
64
Can we build domain specific models?
Thurstonian model applied to wide variety of problems
How about domain specific models? e.g., apply serial recall models to serial recall better specify sources of noise model systematic biases
65
Other slides
67
Results separated by problem
68
Problem PC τ * C τ Rank C τ Rank C τ Rank C τ Rank C τ Rankbooks .000 10 0 6 88 0 6 88 0 5 91 0 7 82 0 12 40
city population europe .000 15 0 12 77 0 12 77 0 12 77 0 11 81 0 17 42city population us .000 14 0 9 90 0 8 91 0 7 96 0 12 67 0 16 45
city population world .000 18 0 16 73 0 16 73 0 16 73 0 15 77 0 19 44country landmass .000 9 0 5 95 0 6 85 0 5 95 0 5 95 0 7 76
country population .000 15 0 10 87 0 10 87 0 11 82 0 11 82 0 15 53hardness .000 15 0 14 64 0 13 73 0 14 64 0 11 91 0 15 46holidays .051 8 0 5 77 0 5 77 0 5 77 0 4 78 1 0 100
movies releasedate .013 6 0 1 99 0 1 99 0 2 95 0 2 95 0 2 95oscar bestmovies .013 10 0 4 90 0 3 97 0 4 90 0 3 97 0 3 97
oscar movies .000 10 0 1 100 0 1 100 0 1 100 0 2 96 0 2 96presidents .064 7 0 1 94 1 0 100 0 1 94 0 3 79 1 0 100
rivers .000 15 0 11 91 0 12 86 0 14 67 0 11 91 0 16 42states westeast .026 6 0 1 97 0 1 97 0 2 88 0 3 78 0 1 97
superbowl .000 17 0 13 86 0 12 88 0 15 71 0 10 96 0 19 40ten amendments .013 13 0 2 97 0 1 99 0 3 96 0 5 90 0 4 95
ten commandments .000 17 0 7 91 0 7 91 0 7 91 0 12 74 0 17 51AVERAGE .011 12.1 .00 6.94 88.0 .06 6.71 88.8 .00 7.29 85.1 .00 7.47 85.3 .12 9.67 68.2
Mallows Model Borda Counts ModeThurstone v1Humans Thurstone v2
Notes
Noise in Thurstonian models acquisition / encoding noise retrieval noise
Link to crowd within (Ed Vul) are our results due to wisdom of crowds or individuals? Probably a bit of both and we cannot tell with our experiments However, there is probably a fair amount of encoding noise that
would not benefit from repeated measurements within individuals Different individuals probably do know different things
69
To Do
Compare explicitly estimated number of guesses with latent confidence
Identifiability issue fix mean A?
Hierarchical model test on small numbers of subjects
Model comparisons on small sets of subjects
70
TO DO: look at kurtosis of sigma distributions
Modeling Group Serial Recall
Goal: infer distribution over orderings of events given verbal reports i.e., P( original order | verbal report )
Many models for serial recall, e.g. Estes Perturbation model (1972) Shiffrin & Cook (1978) SOB (2002) Simple (2007)
but many of these models do not have a likelihood function p( item 1, item 2, …, item N | memory contents )
71
Bayesian Algorithm: not every person has equal weight
72= correct = incorrect
DutchDan
ish
Frenc
h
Japan
ese
Spanish
Arabic
Chinese
German
Italia
n
Russian
ThaiViet
names
e
Wels
h
Georg
ian
Yiddis
h
gelukkig Nieuwjaar 7 3 0 0 0 1 0 0 0 0 0 0 2 0 2
godt nytår 2 3 0 0 0 0 0 2 0 2 0 0 1 3 2
bonne année 0 0 14 0 1 0 0 0 0 0 0 0 0 0 0
0 0 0 9 0 0 2 0 1 0 3 0 0 0 0
feliz año nuevo 0 0 0 0 14 0 0 0 0 0 1 0 0 0 0عامسعيد 0 1 0 0 0 14 0 0 0 0 0 0 0 0 0
0 0 0 2 0 0 12 0 0 0 0 1 0 0 0
ein gutes neues Jahr 3 1 0 0 0 0 0 9 0 0 0 0 1 0 1
felice anno nuovo 0 0 0 0 0 0 0 0 14 1 0 0 0 0 0
С Новым Годом 0 0 1 0 0 0 0 0 0 11 0 0 1 2 0
สวั�สดี�ปี�ใหม่ � 0 0 0 1 0 0 1 0 0 0 7 1 1 4 0Chúc Mừng Nǎm Mới 0 0 0 0 0 0 0 0 0 1 0 11 1 2 0
Blwyddyn Newydd Dda 0 4 0 1 0 0 0 0 0 0 1 0 6 1 2
გილოცავთ ახალ წელს 0 0 0 2 0 0 0 1 0 0 3 2 0 1 6
a gut yohr 3 3 0 0 0 0 0 3 0 0 0 0 2 2 2
Summary of Findings
Extended wisdom of crowds to combinatorial problems approximate inference (MCMC) to infer probability distributions
over permutations
Bayesian methods that are calibrated we can tell who is likely to be accurate without having ground
truth available
73
Graphical Model
74
i items
jx
jy
z
ja
Latent ground truth
Observed matching
Knowledge State
jsProb. of knowing
id
j individuals
logitj i js d a
~ Bernoulliij ijx s
1 1( )
1 / ! 0ij
ij ij ij
xp y z
n x
item and person parameters
When do we get Wisdom of Crowds effect?
Analyze model performance in a variety of tasks
75
MDS solution of pairwise tau distances
76-15 -10 -5 0 5 10 15 20 25 30 35-20
-15
-10
-5
0
5
10
15
7
26
3
16
7 9
61
22
2
13
12
7
11
14
9
5
7
11
8
3
24
3
7
10
10
4
03
6
9
6
26
5
18
44 3
14
6
2
5
3
5
1
4210
11
4
3
42
0
8
21
7
3
5
1
1
8
1
33
14
3
20
6
8
16
7
22
23
2 3710
states westeast
IndividualsTruthThurstonian Model
distance to truth
MDS solution of pairwise tau distances
77-20 -15 -10 -5 0 5 10 15 20 25
-20
-15
-10
-5
0
5
10
15
20
14
23
25
24
1824
13
14
10
5
9
20
8
20
15
18
12
33
25
29
171
14
20
27176
13
11
15
3
17
17
17
24
7
26
9
13
17
27
13
15
11
15
15
23
2811
26
16
4
27
9
23
24
11
17
19
15
22
2
15
14
12
21
11
26
11
18
35
22
10
20
24
25
1
19
7
0
ten commandments
IndividualsTruthThurstonian Model
Modeling Performance Across Task
Current model is applied independently across tasks
Extend hierarchical model with random effects approach to tasks Each person has a an overall ability (Pearson’s “g” ) Ability in a specific task is varies around overall ability
78