Hierarchical Bayesian Models for Aggregating Retrieved Memories across Individuals

Hierarchical Bayesian Models for Aggregating Retrieved Memories across Individuals

Mark Steyvers

Department of Cognitive Sciences

University of California, Irvine

Joint work with:Michael LeeBrent Miller

Pernille HemmerBill Batchelder

Paolo Napoletano

Thomas Jefferson

Andrew Jackson

James Monroe

George Washington

John AdamsGeorge Washington

Ordering problem:

time

what is the correct order of these Presidents?

Goal: aggregating responses

3

D A B C A B D C B A D C A C B D A D B C

Aggregation Algorithm

A B C D A B C D

ground truth

=?

group answer

Bayesian Approach

4

D A B C A B D C B A D C A C B D A D B C

Generative Model

A B C D

ground truth =latent common cause

Important notes:

No communication between individuals

There is always a true answer (ground truth)

Aggregation algorithm never has access to ground truth ground truth only used for evaluation

5

Matching problem:

6

RembrandtVan Gogh Monet Renoir

A BC D

Wisdom of crowds phenomenon

Crowd estimate is often better than any individual in the crowd

(Think of independent noise influencing each individual)

7

Examples of wisdom of crowds phenomenon

8

Who wants to be a millionaire?Galton’s Ox (1907): Median of individual estimates comes close to true answer

Limitations of Current “Wisdom of Crowds” Research

Studies restricted to numeric or categorical judgments simple averaging schemes:

Mode Median Mean

No treatment of individual differences every “vote” is treated equally downplayed role of expertise

9

Cultural Consensus Theory (CCT)E.g. Romney, Batchelder, and Weller (1987)

Finds the “answer key” to multiple choice questions when ground truth is lost takes person and item differences into account

Informal version of CCT also developed for ranking data

10

Research Goals

Generalize “wisdom of crowds” effect to more complex data

Aggregation of permutations Ranking data Matching (assignment) data

11

Hierarchical Bayesian Models

Probability distributions over all permutations of items with N items, there are N! combinations e.g., when N=44, we have 44! > 10^53 combinations Approximate inference methods: MCMC

Cognitively plausible generative processes

Treatment of individual differences

12

Part IOrdering Problems

13

Experiment 1

Task: order all 44 US presidents

Methods 26 participants (college undergraduates) Names of presidents written on cards Cards could be shuffled on large table

14

= 1= 1+1Measuring performance

Kendall’s Tau: The number of adjacent pair-wise swaps

Participant Ordering1 2 5 3 4

Ground Truth1 2 3 4 5

3 451 2

1 2 5 3 4

1 2 3 4 5= 2

Empirical Results

16

1 10 200

100

200

300

400

500

Individuals (ordered from best to worst)

(random guessing)

Probabilistic models Thurstone (1927) Mallows (1957) Plackett-Luce (1975) Lebanon-Mao (2008)

Spectral methods Diaconis (1989)

Heuristic methods from voting theory Borda count

… however, many of these approached developed for preference rankings

Many approaches for analyzing rank data…

17

Bayesian Thurstonian Approach

18

Each item has a true coordinate on some dimension

A B C


19

A B C

… but there is noise because of encoding and/or retrieval error

Person 1


20

Each person’s mental representation is based on (latent) samples of these distributions

B C

A B C

Person 1

A


21

B C

A B C

The observed ordering is based on the ordering of the samples

A < B < C

Observed Ordering:

Person 1

A


22

People draw from distributions with common mean but different variances

Person 1

B C

A B CA < B < C

Observed Ordering:

Person 2

A B C

BC

Observed Ordering:

A < C < BA

A

Graphical Model Notation

23

jx

1x

2x 3xj=1..3

shaded = observednot shaded = latent

Graphical Model of Bayesian Thurstonian Model

24

j individuals

jx

jy

μ

j

| , ~ N ,ij j jx

( )j jranky x

~ Gamma ,1 /j

Latent ground truth

Individual ability

Mental representation

Observed ordering

Inference

Need the posterior distribution

Markov Chain Monte Carlo Gibbs sampling on Metropolis-hastings on and

Draw 400 samples group ordering based on average of across samples

25

jxμ j

μ

, , | jp μ σ x y

1 10 200

50

100

150

200

250

300

350

Individuals

Thurstonian ModelIndividuals

Wisdom of Crowds effect

26

model’s ordering is as good as best individual

Inferred Distributions for 44 US Presidents

27

George Washington (1)John Adams (2)

Thomas Jefferson (3)James Madison (4)James Monroe (6)

John Quincy Adams (5)Andrew Jackson (7)

Martin Van Buren (8)William Henry Harrison (21)

John Tyler (10)James Knox Polk (18)

Zachary Taylor (16)Millard Fillmore (11)Franklin Pierce (19)

James Buchanan (13)Abraham Lincoln (9)

Andrew Johnson (12)Ulysses S. Grant (17)

Rutherford B. Hayes (20)James Garfield (22)Chester Arthur (15)

Grover Cleveland 1 (23)Benjamin Harrison (14)

Grover Cleveland 2 (25)William McKinley (24)

Theodore Roosevelt (29)William Howard Taft (27)

Woodrow Wilson (30)Warren Harding (26)Calvin Coolidge (28)Herbert Hoover (31)

Franklin D. Roosevelt (32)Harry S. Truman (33)

Dwight Eisenhower (34)John F. Kennedy (37)

Lyndon B. Johnson (36)Richard Nixon (39)

Gerald Ford (35)James Carter (38)

Ronald Reagan (40)George H.W. Bush (41)

William Clinton (42)George W. Bush (43)

Barack Obama (44)

median and minimumsigma

Model is calibrated

28

0 0.1 0.2 0.3 0.450

100

150

200

250

300

R=0.941

Individuals with large sigma are far from the truth

Alternative Models

Many heuristic methods from voting theory E.g., Borda count method

Suppose we have 10 items assign a count of 10 to first item, 9 for second item, etc add counts over individuals order items by the Borda count

i.e., rank by average rank across people

29

1 10 200

50

100

150

200

250

300

350

Individuals

Thurstonian ModelBorda countIndividuals

Model Comparison

30

Experiment 2

78 participants 17 problems each with 10 items

Chronological Events Physical Measures Purely ordinal problems, e.g.

Ten Amendments Ten commandments

31

Ordering states west-east

32

Oregon (1)

Utah (2)

Nebraska (3)

Iowa (4)

Alabama (6)

Ohio (5)

Virginia (7)

Delaware (8)

Connecticut (9)

Maine (10)

0 1 2 3

0

5

10

15

20

25

30

35

40

45

R=0.961

Ordering Ten Amendments

33

Freedom of speech & religion (1)

Right to bear arms (2)

No quartering of soldiers (4)

No unreasonable searches (3)

Due process (5)

Trial by Jury (6)

Civil Trial by Jury (7)

No cruel punishment (8)

Right to non-specified rights (10)

Power for the States & People (9)

ten ammendments

0 0.5 1 1.5 2 2.50

5

10

15

20

25

30

35

R=0.889

Ordering Ten Commandments

34

Worship any other God (1)

Make a graven image (7)

Take the Lord's name in vain (2)

Break the Sabbath (3)

Dishonor your parents (4)

Murder (6)

Commit adultery (8)

Steal (5)

Bear false witness (9)

Covet (10)

0 0.5 1 1.5 20

5

10

15

20

25

30

35

R=0.722

Average results over 17 Problems

35

1 10 20 30 40 50 60 70 800

5

10

15

20

25

Individuals

Me

an

Thurstonian ModelBorda countModeIndividuals

Effect of Group Composition

How many individuals do we need to average over?

36

Effect of Group Size: random groups

37

0 10 20 30 40 50 60 70 807

8

9

10

11

12

13

14

Group Size

T=0T=2

T=12

Experts vs. Crowds

Can we find experts in the crowd? Can we form small groups of experts?

Approach Form a group for some particular task Select individuals with the smallest sigma (“experts”) based on

previous tasks Vary the number of previous tasks

38

Group Composition based on prior performance

39

0 10 20 30 40 50 60 70 807

8

9

10

11

12

13

14

Group Size

T=0T=2

T=12

T = 0

# previous tasks

T = 2T = 8

Group size (best individuals first)

Methods for Selecting Experts

40

Endogenous: no feedback

required

Exogenous: selecting people based on

actual performance

0 10 20 30 407

8

9

10

11

12

13

14

0 20 407

8

9

10

11

12

13

14

Model incorporating overall person ability

41

j individuals

jmx

jmy

mμ

jm

| , ~ N ,ijm m jm m jmx

( )jm jmranky x

~ Gamma ,1 /jm j j

Overall ability

Task specific ability

m tasks

j ~ Gamma ,1 /j j individuals

1 10 20 30 40 50 60 70 800

5

10

15

20

25

Individuals

Mea

n

Thurstonian Model v1Thurstonian Model v2Borda countModeIndividuals

Average results over 17 Problems

42

Me

an

new model

Part IIOrdering Problems in Episodic Memory

43

Another ordering problem:

44

http://www.youtube.com/watch?v=29VGZtnCD30&feature=related

A

B

C

D

time

http://www.youtube.com/watch?v=29VGZtnCD30&feature=related

Experiment 3

26 participants

6 videos 3 videos with stereotyped event sequences (e.g. wedding) 3 videos “unpredictable” videos (e.g., example video) extracted 10 stills for testing

Method study video followed by immediate ordering test of 10 items

45

Bayesian Thurstonian Model

46

event1 (1)

event2 (2)

event3 (3)

event4 (4)

event5 (7)

event6 (6)

event7 (5)

event8 (8)

event9 (9)

event10 (10)

yogurt commercial

0 0.5 1 1.5 2

0

5

10

15

20

R=0.890

= 3

Two other examples

47

event1 (1)

event2 (2)

event3 (3)

event4 (4)

event5 (6)

event6 (5)

event7 (7)

event8 (8)

event9 (9)

event10 (10)

clay animation

= 1 event1 (1)

event2 (2)

event3 (3)

event4 (4)

event5 (5)

event6 (6)

event7 (7)

event8 (8)

event9 (9)

event10 (10)

wedding

= 0

Overall Results

48

1 10 20 300

5

10

15

Individuals

Thurstonian ModelBorda countModeIndividuals

Me

an

Part IIIMatching Problems

49

Example Matching Problem (one-to-one)

50

Dutch

Danish

Yiddish

Thai

Vietnamese

Chinese

Georgian

Russian

Japanese

A

B

C

D

E

F

G

H

I

godt nytår

gelukkig nieuwjaar

a gut yohr

С Новым Годом

สวั�สดี�ปี�ใหม่�

Chúc Mừng Nǎm Mới

გილოცავთ ახალწელს

Experiment

17 Participants

8 matching problems, e.g. car logo’s and brand names first and last names philosophers flags and countries greek symbols and letter names

Number of items varied between 10 and 24 with 24 items, we have 24! possibilities

51

Overall Results

52

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 170

0.2

0.4

0.6

0.8

1

Individuals

Mea

n A

ccur

acy

Heuristic Aggregation Approach

Combinatorial optimization problem maximizes agreement in assigning N items to N responses

Hungarian algorithm construct a count matrix M Mij = number of people that paired item i with response j find row and column permutations to maximize diagonal sum O( n3 )

53

Hungarian Algorithm Example

54= correct

Dutch

Danish

Fren

ch

Japa

nese

Span

ish

Arabic

Chinese

German

Italia

n

Russian

Thai

Vietnam

ese

Wels

h

Georg

ian

Yiddis

h

gelukkig Nieuwjaar 7 3 0 0 0 1 0 0 0 0 0 0 2 0 2

godt nytår 2 3 0 0 0 0 0 2 0 2 0 0 1 3 2

bonne année 0 0 14 0 1 0 0 0 0 0 0 0 0 0 0

0 0 0 9 0 0 2 0 1 0 3 0 0 0 0

feliz año nuevo 0 0 0 0 14 0 0 0 0 0 1 0 0 0 0عامسعيد 0 1 0 0 0 14 0 0 0 0 0 0 0 0 0

0 0 0 2 0 0 12 0 0 0 0 1 0 0 0

ein gutes neues Jahr 3 1 0 0 0 0 0 9 0 0 0 0 1 0 1

felice anno nuovo 0 0 0 0 0 0 0 0 14 1 0 0 0 0 0

С Новым Годом 0 0 1 0 0 0 0 0 0 11 0 0 1 2 0

สวั�สดี�ปี�ใหม่ � 0 0 0 1 0 0 1 0 0 0 7 1 1 4 0Chúc Mừng Nǎm Mới 0 0 0 0 0 0 0 0 0 1 0 11 1 2 0

Blwyddyn Newydd Dda 0 4 0 1 0 0 0 0 0 0 1 0 6 1 2

გილოცავთ ახალ წელს 0 0 0 2 0 0 0 1 0 0 3 2 0 1 6

a gut yohr 3 3 0 0 0 0 0 3 0 0 0 0 2 2 2

= incorrect

Hungarian Algorithm Results (2)

55

Leonardo da Vinci 9 5 1 0 0 0 0 0 0 0Jan Vermeer 0 7 3 0 2 2 0 0 1 0

Rembrandt van Rijn 1 0 4 3 1 1 1 1 3 0Pablo Picasso 0 0 1 6 6 0 0 1 1 0

Vincent van Gogh 0 0 0 1 6 4 0 1 2 1Renoir 1 0 1 0 0 3 7 2 1 0Monet 0 1 1 0 0 2 5 3 0 3

Jan Van Eyck 0 2 3 2 0 0 0 5 3 0Edvard Munch 0 0 1 0 0 3 0 2 4 5

Salvador Dali 4 0 0 3 0 0 2 0 0 6

Bayesian Matching Model

56

Proposed process:

- match “known” items- guess between remaining ones

Individual differences:

-some items easier to know-some participants know more

Dutch

Danish

Yiddish

Russian

godt nytår

gelukkig nieuwjaar

a gut yohr

С Новым Годом

Graphical Model

57

i items

jx

jy

z

ja

Latent ground truth

Observed matching

Knowledge State

jsProb. of knowing

id

j individuals

logitj i js d a

~ Bernoulliij ijx s

1 1( )

1 / ! 0ij

ij ij ij

xp y z

n x

person abilityitem easiness

Overall Modeling Results

58

1 10 200

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Individuals

Mea

n A

ccur

acy

Bayesian MatchingHungarian AlgorithmIndividuals

Calibration at level of items and people(for paintings problem)

59

0 0.5 10

0.2

0.4

0.6

0.8

1

D (inferred)

D (

act

ual)

Greek symbols

R=0.953

0 0.5 10

0.2

0.4

0.6

0.8

1

D (inferred)

D (

act

ual)

Philosophers

R=0.978

0 0.5 10

0.2

0.4

0.6

0.8

1

D (inferred)

D (

act

ual)

Flags

R=0.973

0 0.5 10

0.2

0.4

0.6

0.8

1

D (inferred)

D (

act

ual)

Paintings

R=0.916

0 0.5 10

0.2

0.4

0.6

0.8

1

D (inferred)

D (

act

ual)

US presidents

R=0.960

0 0.5 10

0.2

0.4

0.6

0.8

1

D (inferred)

D (

act

ual)

Car logos

R=0.918

0 0.5 10

0.2

0.4

0.6

0.8

1

D (inferred)

D (

act

ual)

Languages

R=0.947

0 0.5 10

0.2

0.4

0.6

0.8

1

D (inferred)

D (

act

ual)

Sport balls

R=0.963

0 0.5 10

0.2

0.4

0.6

0.8

1

A (inferred)

A (

act

ual)

Greek symbols

R=0.990

0 0.5 10

0.2

0.4

0.6

0.8

1

A (inferred)

A (

act

ual)

Philosophers

R=0.992

0 0.5 10

0.2

0.4

0.6

0.8

1

A (inferred)

A (

act

ual)

Flags

R=0.987

0 0.5 10

0.2

0.4

0.6

0.8

1

A (inferred)

A (

act

ual)

Paintings

R=0.975

0 0.5 10

0.2

0.4

0.6

0.8

1

A (inferred)

A (

act

ual)

US presidents

R=0.992

0 0.5 10

0.2

0.4

0.6

0.8

1

A (inferred)

A (

act

ual)

Car logos

R=0.992

0 0.5 10

0.2

0.4

0.6

0.8

1

A (inferred)

A (

act

ual)

Languages

R=0.968

0 0.5 10

0.2

0.4

0.6

0.8

1

A (inferred)

A (

act

ual)

Sport balls

R=0.995

ITEMS INDIVIDUALS

How predictive are subject provided confidence ratings?

60

0 1-2 3-4 5+0

0.2

0.4

0.6

0.8

1

0 1-2 3-4 5+0

0.2

0.4

0.6

0.8

1

# guesses estimatedby individual

Acc

urac

y

# guesses estimatedby model

(based on variable A)

r=-.42 r=-.77

Part IVOpen Issues

61

When do we get wisdom of crowds effect?

Independent errors different people knowing different things

Population response centered around ground truth

Some minimal number of individuals 10-20 individuals often sufficient

62

What are methods for finding experts?

1) Self-reported expertise: unreliable has led to claims of “myth of expertise”

2) Based on explicit scores by comparing to ground truth but ground truth might not be immediately available

3) Endogenously discover experts Use the crowd to discover experts Small groups of experts can be effective

63

What to do about systematic biases?

In some tasks, individuals systematically distort the ground truth spatial and temporal distortions memory distortions (e.g. false memory) decision-making distortions

Does this diminish the wisdom of crowds effect? maybe… but a model that predicts these systematic distortions might be

able to “undo” them

64

Can we build domain specific models?

Thurstonian model applied to wide variety of problems

How about domain specific models? e.g., apply serial recall models to serial recall better specify sources of noise model systematic biases

65

That’s all

66

Do the experiments yourself:

http://psiexp.ss.uci.edu/

http://psiexp.ss.uci.edu/

Other slides

67

Results separated by problem

68

Problem PC τ * C τ Rank C τ Rank C τ Rank C τ Rank C τ Rankbooks .000 10 0 6 88 0 6 88 0 5 91 0 7 82 0 12 40

city population europe .000 15 0 12 77 0 12 77 0 12 77 0 11 81 0 17 42city population us .000 14 0 9 90 0 8 91 0 7 96 0 12 67 0 16 45

city population world .000 18 0 16 73 0 16 73 0 16 73 0 15 77 0 19 44country landmass .000 9 0 5 95 0 6 85 0 5 95 0 5 95 0 7 76

country population .000 15 0 10 87 0 10 87 0 11 82 0 11 82 0 15 53hardness .000 15 0 14 64 0 13 73 0 14 64 0 11 91 0 15 46holidays .051 8 0 5 77 0 5 77 0 5 77 0 4 78 1 0 100

movies releasedate .013 6 0 1 99 0 1 99 0 2 95 0 2 95 0 2 95oscar bestmovies .013 10 0 4 90 0 3 97 0 4 90 0 3 97 0 3 97

oscar movies .000 10 0 1 100 0 1 100 0 1 100 0 2 96 0 2 96presidents .064 7 0 1 94 1 0 100 0 1 94 0 3 79 1 0 100

rivers .000 15 0 11 91 0 12 86 0 14 67 0 11 91 0 16 42states westeast .026 6 0 1 97 0 1 97 0 2 88 0 3 78 0 1 97

superbowl .000 17 0 13 86 0 12 88 0 15 71 0 10 96 0 19 40ten amendments .013 13 0 2 97 0 1 99 0 3 96 0 5 90 0 4 95

ten commandments .000 17 0 7 91 0 7 91 0 7 91 0 12 74 0 17 51AVERAGE .011 12.1 .00 6.94 88.0 .06 6.71 88.8 .00 7.29 85.1 .00 7.47 85.3 .12 9.67 68.2

Mallows Model Borda Counts ModeThurstone v1Humans Thurstone v2

Notes

Noise in Thurstonian models acquisition / encoding noise retrieval noise

Link to crowd within (Ed Vul) are our results due to wisdom of crowds or individuals? Probably a bit of both and we cannot tell with our experiments However, there is probably a fair amount of encoding noise that

would not benefit from repeated measurements within individuals Different individuals probably do know different things

69

To Do

Compare explicitly estimated number of guesses with latent confidence

Identifiability issue fix mean A?

Hierarchical model test on small numbers of subjects

Model comparisons on small sets of subjects

70

TO DO: look at kurtosis of sigma distributions

Modeling Group Serial Recall

Goal: infer distribution over orderings of events given verbal reports i.e., P( original order | verbal report )

Many models for serial recall, e.g. Estes Perturbation model (1972) Shiffrin & Cook (1978) SOB (2002) Simple (2007)

but many of these models do not have a likelihood function p( item 1, item 2, …, item N | memory contents )

71

Bayesian Algorithm: not every person has equal weight

72= correct = incorrect

DutchDan

ish

Frenc

h

Japan

ese

Spanish

Arabic

Chinese

German

Italia

n

Russian

ThaiViet

names

e

Wels

h

Georg

ian

Yiddis

h

gelukkig Nieuwjaar 7 3 0 0 0 1 0 0 0 0 0 0 2 0 2

godt nytår 2 3 0 0 0 0 0 2 0 2 0 0 1 3 2

bonne année 0 0 14 0 1 0 0 0 0 0 0 0 0 0 0

0 0 0 9 0 0 2 0 1 0 3 0 0 0 0

feliz año nuevo 0 0 0 0 14 0 0 0 0 0 1 0 0 0 0عامسعيد 0 1 0 0 0 14 0 0 0 0 0 0 0 0 0

0 0 0 2 0 0 12 0 0 0 0 1 0 0 0

ein gutes neues Jahr 3 1 0 0 0 0 0 9 0 0 0 0 1 0 1

felice anno nuovo 0 0 0 0 0 0 0 0 14 1 0 0 0 0 0

С Новым Годом 0 0 1 0 0 0 0 0 0 11 0 0 1 2 0

สวั�สดี�ปี�ใหม่ � 0 0 0 1 0 0 1 0 0 0 7 1 1 4 0Chúc Mừng Nǎm Mới 0 0 0 0 0 0 0 0 0 1 0 11 1 2 0

Blwyddyn Newydd Dda 0 4 0 1 0 0 0 0 0 0 1 0 6 1 2

გილოცავთ ახალ წელს 0 0 0 2 0 0 0 1 0 0 3 2 0 1 6

a gut yohr 3 3 0 0 0 0 0 3 0 0 0 0 2 2 2

Summary of Findings

Extended wisdom of crowds to combinatorial problems approximate inference (MCMC) to infer probability distributions

over permutations

Bayesian methods that are calibrated we can tell who is likely to be accurate without having ground

truth available

73

Graphical Model

74

i items

jx

jy

z

ja

Latent ground truth

Observed matching

Knowledge State

jsProb. of knowing

id

j individuals

logitj i js d a

~ Bernoulliij ijx s

1 1( )

1 / ! 0ij

ij ij ij

xp y z

n x

item and person parameters

When do we get Wisdom of Crowds effect?

Analyze model performance in a variety of tasks

75

MDS solution of pairwise tau distances

76-15 -10 -5 0 5 10 15 20 25 30 35-20

-15

-10

-5

0

5

10

15

7

26

3

16

7 9

61

22

2

13

12

7

11

14

9

5

7

11

8

3

24

3

7

10

10

4

03

6

9

6

26

5

18

44 3

14

6

2

5

3

5

1

4210

11

4

3

42

0

8

21

7

3

5

1

1

8

1

33

14

3

20

6

8

16

7

22

23

2 3710

states westeast

IndividualsTruthThurstonian Model

distance to truth

MDS solution of pairwise tau distances

77-20 -15 -10 -5 0 5 10 15 20 25

-20

-15

-10

-5

0

5

10

15

20

14

23

25

24

1824

13

14

10

5

9

20

8

20

15

18

12

33

25

29

171

14

20

27176

13

11

15

3

17

17

17

24

7

26

9

13

17

27

13

15

11

15

15

23

2811

26

16

4

27

9

23

24

11

17

19

15

22

2

15

14

12

21

11

26

11

18

35

22

10

20

24

25

1

19

7

0

ten commandments

IndividualsTruthThurstonian Model

Modeling Performance Across Task

Current model is applied independently across tasks

Extend hierarchical model with random effects approach to tasks Each person has a an overall ability (Pearson’s “g” ) Ability in a specific task is varies around overall ability

78

Documents

Hierarchical Bayesian Models for Aggregating Retrieved Memories across Individuals