1
Optimal predictions in everyday cognition Tom Griffiths Josh Tenenbaum Brown University MIT Predicting the future timality and Bayesian inference Results The effects of prior knowledge QuickTime™ and a TIFF (Uncompressed) deco are needed to see this QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture Many people believe that perception is optimal… QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. are needed to see t QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. …but cognition is not. = H h h p h d p h p h d p d h p ) ( ) | ( ) ( ) | ( ) | ( Posterior probability Likelihood Prior probability Sum over space of hypotheses h: hypothesis d: data A puzzle If they do not use priors, how do people… predict the future infer causal relationships identify the work of chance assess similarity and make generalizations learn languages and concepts …and solve other inductive problems? Drawing strong conclusions from limited data requires using prior knowledge QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. How often is Google News updated? t = time since last update t total = time between updates What should we guess for t total given t? More generally… • You encounter a phenomenon that has existed for t units of time. How long will it continue into the future? (i.e. what’s t total ?) • We could replace “time” with any other variable that ranges from 0 to some unknown upper limit Everyday prediction problems • You read about a movie that has made $60 million to date. How much money will it make in total? • You see that something has been baking in the oven for 34 minutes. How long until it’s ready? • You meet someone who is 78 years old. How long will they live? • Your friend quotes to you from line 17 of his favorite poem. How long is the poem? • You see taxicab #107 pull up to the curb in front of the train station. How many cabs in this city? Bayesian inference p(t total |t) p(t|t total ) p(t total ) assuming random sampling, the likelihood is p(t|t total ) = 1/t total posterior probability likelihood prior p(t total |t) t total Not the maximal value of p(t total |t) (that’s just t* = t) What is the best guess for t total ? (call it t*) p(t total |t) t total t total = t t 4000 years, t* 8000 years QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Predicting everyday events • This seems like a good strategy… – You meet someone who is 35 years old. How long will they live? – “70 years” seems reasonable • But, it’s not so simple: – You meet someone who is 78 years old. How long will they live? – You meet someone who is 6 years old. How long will they live? The effects of priors Evaluating human predictions • Different domains with different priors: –a movie has made $60 million [power-law] –your friend quotes from line 17 of a poem [power-law] –you meet a 78 year old man [Gaussian] –a movie has been running for 55 minutes [Gaussian] –a U.S. congressman has served for 11 years [Erlang] • Prior distributions derived from actual data • Use 5 values of t for each People predict t total • A total of 350 participants and ten scenarios people parametric prior empirical prior Gott’s rule Nonparametric priors You arrive at a friend’s house, and see that a cake has been in the oven for 34 minutes. How long will it be in the oven? People make good predictions despite the complex distribution You learn that in ancient Egypt, there was a great flood in the 11th year of a pharaoh’s reign. How long did he reign? How long did the typical pharaoh reign in ancient Egypt? People identify the form, but are mistaken about the parameters No direct experience Conclusions • People produce accurate predictions for the duration and extent of everyday events • People have strong prior knowledge – form of the prior (power-law or exponential) – distribution given that form (parameters) – non-parametric distribution when necessary • Reveals a surprising correspondence between probabilities in the mind and in the world, and suggests that people do use prior probabilities in making inductive inferences In particular, there is controversy over whether people’s inferences follow Bayes’ rule which indicates how a rational agent should update beliefs about hypotheses h in light of data d. Several results suggest people do not combine prior probabilities with data correctly. (e.g., Tversky & Kahneman, 1974) Strategy: examine the influence of prior knowledge in an inductive problem we solve every day We use the posterior median P(t total < t*|t) = 0.5 t*t What should we use as the prior, p(t total )? Gott (1993): use the uninformative prior p(t total ) 1/t total Yields a simple prediction rule: t* = 2t

Optimal predictions in everyday cognition Tom Griffiths Josh Tenenbaum Brown University MIT Predicting the future Optimality and Bayesian inference Results

Embed Size (px)

Citation preview

Page 1: Optimal predictions in everyday cognition Tom Griffiths Josh Tenenbaum Brown University MIT Predicting the future Optimality and Bayesian inference Results

Optimal predictions in everyday cognitionTom Griffiths Josh Tenenbaum

Brown University MIT

Predicting the futureOptimality and Bayesian inference ResultsThe effects of prior knowledge

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Many people believe that perception is optimal…

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

…but cognition is not.

∑∈′

′′=

Hh

hphdp

hphdpdhp

)()|(

)()|()|(

Posteriorprobability

Likelihood Priorprobability

Sum over space of hypothesesh: hypothesis

d: data

A puzzleIf they do not use priors, how do people…• predict the future

• infer causal relationships

• identify the work of chance

• assess similarity and make generalizations

• learn languages and concepts

…and solve other inductive problems?

Drawing strong conclusions from limited data requires using prior knowledge

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

How often is Google News updated?

t = time since last update

ttotal= time between updates

What should we guess for ttotal given t?

More generally…• You encounter a phenomenon that has existed for t

units of time. How long will it continue into the future? (i.e. what’s ttotal?)

• We could replace “time” with any other variable that ranges from 0 to some unknown upper limit

Everyday prediction problems• You read about a movie that has made $60 million to date. How much money

will it make in total?

• You see that something has been baking in the oven for 34 minutes. How long until it’s ready?

• You meet someone who is 78 years old. How long will they live?

• Your friend quotes to you from line 17 of his favorite poem. How long is the poem?

• You see taxicab #107 pull up to the curb in front of the train station. How many cabs in this city?

Bayesian inferencep(ttotal|t) p(t|ttotal) p(ttotal)

assuming random sampling, the likelihood is

p(t|ttotal) = 1/ttotal

posterior probability

likelihood prior

p(ttotal|t)

ttotal

Not the maximal value of p(ttotal|t)(that’s just t* = t)

What is the best guess for ttotal? (call it t*)

p(ttotal|t)

ttotalttotal = t

t 4000 years, t* 8000 years

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Predicting everyday events• This seems like a good strategy…

– You meet someone who is 35 years old. How long will they live?– “70 years” seems reasonable

• But, it’s not so simple:– You meet someone who is 78 years old. How long will they live?

– You meet someone who is 6 years old. How long will they live?

The effects of priors

Evaluating human predictions• Different domains with different priors:

–a movie has made $60 million [power-law]

–your friend quotes from line 17 of a poem [power-law]

–you meet a 78 year old man [Gaussian]

–a movie has been running for 55 minutes [Gaussian]

–a U.S. congressman has served for 11 years [Erlang]

• Prior distributions derived from actual data

• Use 5 values of t for each

• People predict ttotal

• A total of 350 participants and ten scenarios

peopleparametric priorempirical prior

Gott’s rule

Nonparametric priors

You arrive at a friend’s house, and see that a cake has been in the oven for 34 minutes. How long will it be in the oven?

People make good predictions despite the complex distribution

You learn that in ancient Egypt, there was a great flood in the 11th year of a pharaoh’s reign. How long did he reign?

How long did the typical pharaoh reign in ancient Egypt?

People identify the form, but are mistaken about the parameters

No direct experience

Conclusions• People produce accurate predictions for the duration and

extent of everyday events • People have strong prior knowledge

– form of the prior (power-law or exponential)– distribution given that form (parameters)– non-parametric distribution when necessary

• Reveals a surprising correspondence between probabilities in the mind and in the world, and suggests that people do use prior probabilities in making inductive inferences

In particular, there is controversy over whether people’s inferences follow Bayes’ rule

which indicates how a rational agent should update beliefs about hypotheses h in light of data d. Several results suggest people do not combine prior probabilities with data correctly. (e.g., Tversky & Kahneman, 1974)

Strategy: examine the influence of prior knowledge in an inductive problem we solve every day

We use the posterior medianP(ttotal < t*|t) = 0.5

t*t

What should we use as the prior, p(ttotal)?

Gott (1993): use the uninformative priorp(ttotal) 1/ttotal

Yields a simple prediction rule:t* = 2t