View
4
Download
0
Category
Preview:
Citation preview
PROBABILISTIC MODELS OF LEARNING AND MEMORYUncertainty and Bayesian inference
MÁTÉ LENGYEL
Computational and Biological Learning LabDepartment of Engineering
University of Cambridge
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009 2
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009 2
listen to the words
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
UNCONSCIOUS INFERENCES
2
Hermann von Helmholtz1867
listen to the words
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
UNCONSCIOUS INFERENCES
2
Adelson, unpubl
Hermann von Helmholtz1867
listen to the words
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
UNCONSCIOUS INFERENCES
2
Adelson, unpubl
Hermann von Helmholtz1867
listen to the words
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
UNCONSCIOUS INFERENCES
2
Adelson, unpubl
Hermann von Helmholtz1867
stimulus
percept
prior knowledge
listen to the words
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
UNCONSCIOUS INFERENCES, CONT’D
3
stimulus
percept
prior knowledge
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
UNCONSCIOUS INFERENCES, CONT’D
3
seat
radio
table
rocking
bench
boat
chair Roed
iger
& M
cDer
mot
t, 1
995
stimulus
percept
prior knowledge
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
UNCONSCIOUS INFERENCES, CONT’D
3
seat
radio
table
rocking
bench
boat
chair
✗✓
✓✗✗✗✗ Ro
edig
er &
McD
erm
ott,
199
5
stimulus
percept
prior knowledge
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
UNCONSCIOUS INFERENCES, CONT’D
3
seat
radio
table
rocking
bench
boat
chair
✗✓
✓✗✗✗✗ Ro
edig
er &
McD
erm
ott,
199
5
experience
memories
stimulus
percept
prior knowledge
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
UNCONSCIOUS INFERENCES, CONT’D
3
seat
radio
table
rocking
bench
boat
chair
✗✓
✓✗✗✗✗ Ro
edig
er &
McD
erm
ott,
199
5
experience
memories
stimulus
percept
prior knowledge
learning
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
FORMALISING INFERENCES
4
possible
impossible
belief
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
FORMALISING INFERENCES
4
possible
impossible
belief
shade of square A
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
FORMALISING INFERENCES
4
possible
impossible
belief
physical luminance
shade of square A
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
FORMALISING INFERENCES
4
possible
impossible
belief
physical luminanceknowledge of checkerboards
shade of square A
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
FORMALISING INFERENCES
4
possible
impossible
belief
physical luminanceknowledge of checkerboards
shade of square A
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
FORMALISING INFERENCES
4
possible
impossible
belief
physical luminanceknowledge of checkerboards
shade of square B
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
FORMALISING INFERENCES
4
possible
impossible
belief
physical luminanceknowledge of checkerboards
shade of square B
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
FORMALISING INFERENCES
4
possible
impossible
belief
physical luminanceknowledge of checkerboards
shade of square B
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
REPRESENTING DEGREES OF BELIEFS AS PROBABILITIES
5
0
physical luminanceknowledge of checkerboards
shade of square B
P(shade of square B)
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
REPRESENTING DEGREES OF BELIEFS AS PROBABILITIES
5
0
physical luminanceknowledge of checkerboards
shade of square B
P(shade of square B)
Dutch Book Theorem:If you are willing to bet on your beliefs, then unless they satisfy the axioms of probability, there will always be a set of bets (a “Dutch book”) that you would accept which is guaranteed to lose you money, no matter what the outcome is!
odds(shade of square B = x) =!$ if shade of square B "= x
+$ if shade of square B = x=
P(shade of square B = x)P(shade of square B "= x)
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
THE AXIOMS OF PROBABILITY(HOW TO REPRESENT YOUR BELIEFS)
6
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
THE AXIOMS OF PROBABILITY(HOW TO REPRESENT YOUR BELIEFS)
6
0 ! P(x) ! 1
P(x|y) belief in x if we know y is true
P(x) = 1 ! x is certainly trueP(x) = 0 ! x is certainly not true
properties:
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
THE AXIOMS OF PROBABILITY(HOW TO REPRESENT YOUR BELIEFS)
6
0 ! P(x) ! 1
P(x|y) belief in x if we know y is true
P(x) = 1 ! x is certainly trueP(x) = 0 ! x is certainly not true
properties:
axioms:
probabilities are non-negative: P(x) ! 0
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
THE AXIOMS OF PROBABILITY(HOW TO REPRESENT YOUR BELIEFS)
6
0 ! P(x) ! 1
P(x|y) belief in x if we know y is true
P(x) = 1 ! x is certainly trueP(x) = 0 ! x is certainly not true
properties:
axioms:
!
x
P(x) = 1probabilities are normalised
probabilities are non-negative: P(x) ! 0
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
THE AXIOMS OF PROBABILITY(HOW TO REPRESENT YOUR BELIEFS)
6
joint probability
0 ! P(x) ! 1
P(x|y) belief in x if we know y is true
P(x) = 1 ! x is certainly trueP(x) = 0 ! x is certainly not true
properties:
axioms:
!
x
P(x) = 1probabilities are normalised
P(x, y) = P(x) · P(y) ! x and y they are independent
P(x, y) ! P(x) =!
y
P(x, y) marginal probability
probabilities are non-negative: P(x) ! 0
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
THE AXIOMS OF PROBABILITY(HOW TO REPRESENT YOUR BELIEFS)
6
joint probability
conditional probability by Bayes’ rule
0 ! P(x) ! 1
P(x|y) belief in x if we know y is true
P(x) = 1 ! x is certainly trueP(x) = 0 ! x is certainly not true
properties:
axioms:
!
x
P(x) = 1probabilities are normalised
P(x, y) = P(x|y) · P(y) = P(y|x) · P(x)
P(x, y) = P(x) · P(y) ! x and y they are independent
P(x, y) ! P(x) =!
y
P(x, y) marginal probability
probabilities are non-negative: P(x) ! 0
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
THE AXIOMS OF PROBABILITY(HOW TO REPRESENT YOUR BELIEFS)
6
joint probability
conditional probability by Bayes’ rule
0 ! P(x) ! 1
P(x|y) belief in x if we know y is true
P(x) = 1 ! x is certainly trueP(x) = 0 ! x is certainly not true
properties:
axioms:
!
x
P(x) = 1probabilities are normalised
P(x, y) = P(x|y) · P(y) = P(y|x) · P(x) P(x|y) =P(y|x) P(x)
P(y)
P(x, y) = P(x) · P(y) ! x and y they are independent
P(x, y) ! P(x) =!
y
P(x, y) marginal probability
probabilities are non-negative: P(x) ! 0
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
THE AXIOMS OF PROBABILITY(HOW TO REPRESENT YOUR BELIEFS)
6
joint probability
conditional probability by Bayes’ rule
0 ! P(x) ! 1
P(x|y) belief in x if we know y is true
P(x) = 1 ! x is certainly trueP(x) = 0 ! x is certainly not true
properties:
axioms:
!
x
P(x) = 1probabilities are normalised
P(x, y) = P(x|y) · P(y) = P(y|x) · P(x) P(x|y) =P(y|x) P(x)
P(y)
posterior likelihood prior∝ ×
P(x, y) = P(x) · P(y) ! x and y they are independent
P(x, y) ! P(x) =!
y
P(x, y) marginal probability
probabilities are non-negative: P(x) ! 0
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
REPRESENTING DEGREES OF BELIEFS AS PROBABILITIES
7
0
physical luminanceknowledge of checkerboards
shade of square B
P(shade of square B)
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
REPRESENTING DEGREES OF BELIEFS AS PROBABILITIES
7
0
physical luminanceknowledge of checkerboards
shade of square B
P(shade of square B)
P(shade of square B | luminance, checkerboard, shadows) ∝
∝ P(luminance of square B | shade of square B) × P(shade of square B | checkerboard)
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
REPRESENTING DEGREES OF BELIEFS AS PROBABILITIES
7
0
physical luminanceknowledge of checkerboards
shade of square B
P(shade of square B)
P(shade of square B | luminance, checkerboard, shadows) ∝
∝ P(luminance of square B | shade of square B) × P(shade of square B | checkerboard)
posterior
likelihood prior
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
BAYESIAN DECISION THEORY(HOW TO MAKE POINT ESTIMATES)
8
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
BAYESIAN DECISION THEORY(HOW TO MAKE POINT ESTIMATES)
8
state of the worldx1 x2 x3
action
a1 L(a1,x1) L(a1,x2) L(a1,x3)a2 L(a2,x1) L(a2,x2) L(a2,x3)a3 L(a3,x1) L(a3,x2) L(a3,x3)
...
...
loss function
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
BAYESIAN DECISION THEORY(HOW TO MAKE POINT ESTIMATES)
8
state of the worldx1 x2 x3
action
a1 L(a1,x1) L(a1,x2) L(a1,x3)a2 L(a2,x1) L(a2,x2) L(a2,x3)a3 L(a3,x1) L(a3,x2) L(a3,x3)
...
...
action to choose:
note: a and x need not live in the same space
loss function
a! = argmina
!
x
L(a, x) P(x)
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
BAYESIAN DECISION THEORY(HOW TO MAKE POINT ESTIMATES)
8
state of the worldx1 x2 x3
action
a1 L(a1,x1) L(a1,x2) L(a1,x3)a2 L(a2,x1) L(a2,x2) L(a2,x3)a3 L(a3,x1) L(a3,x2) L(a3,x3)
...
...
action to choose:
note: a and x need not live in the same space
special cases when and x do live in the same space
loss function
a! = argmina
!
x
L(a, x) P(x)
a = x̂
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
BAYESIAN DECISION THEORY(HOW TO MAKE POINT ESTIMATES)
8
state of the worldx1 x2 x3
action
a1 L(a1,x1) L(a1,x2) L(a1,x3)a2 L(a2,x1) L(a2,x2) L(a2,x3)a3 L(a3,x1) L(a3,x2) L(a3,x3)
...
...
action to choose:
note: a and x need not live in the same space
special cases when and x do live in the same space
loss function
a! = argmina
!
x
L(a, x) P(x)
a = x̂
L(x̂, x) = (x̂! x)2
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
BAYESIAN DECISION THEORY(HOW TO MAKE POINT ESTIMATES)
8
state of the worldx1 x2 x3
action
a1 L(a1,x1) L(a1,x2) L(a1,x3)a2 L(a2,x1) L(a2,x2) L(a2,x3)a3 L(a3,x1) L(a3,x2) L(a3,x3)
...
...
action to choose:
note: a and x need not live in the same space
special cases when and x do live in the same space
posterior mean
loss function
a! = argmina
!
x
L(a, x) P(x)
a = x̂
L(x̂, x) = (x̂! x)2 x̂ =!
x
xP(x)
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
BAYESIAN DECISION THEORY(HOW TO MAKE POINT ESTIMATES)
8
state of the worldx1 x2 x3
action
a1 L(a1,x1) L(a1,x2) L(a1,x3)a2 L(a2,x1) L(a2,x2) L(a2,x3)a3 L(a3,x1) L(a3,x2) L(a3,x3)
...
...
action to choose:
note: a and x need not live in the same space
special cases when and x do live in the same space
posterior mean
loss function
a! = argmina
!
x
L(a, x) P(x)
a = x̂
L(x̂, x) = (x̂! x)2 x̂ =!
x
xP(x)
L(x̂, x) = |x̂! x|
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
BAYESIAN DECISION THEORY(HOW TO MAKE POINT ESTIMATES)
8
state of the worldx1 x2 x3
action
a1 L(a1,x1) L(a1,x2) L(a1,x3)a2 L(a2,x1) L(a2,x2) L(a2,x3)a3 L(a3,x1) L(a3,x2) L(a3,x3)
...
...
action to choose:
note: a and x need not live in the same space
special cases when and x do live in the same space
posterior mean
posterior median
loss function
a! = argmina
!
x
L(a, x) P(x)
a = x̂
L(x̂, x) = (x̂! x)2 x̂ =!
x
xP(x)
L(x̂, x) = |x̂! x|x̂!
x=!"P(x) =
12
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
BAYESIAN DECISION THEORY(HOW TO MAKE POINT ESTIMATES)
8
state of the worldx1 x2 x3
action
a1 L(a1,x1) L(a1,x2) L(a1,x3)a2 L(a2,x1) L(a2,x2) L(a2,x3)a3 L(a3,x1) L(a3,x2) L(a3,x3)
...
...
action to choose:
note: a and x need not live in the same space
special cases when and x do live in the same space
posterior mean
posterior median
loss function
a! = argmina
!
x
L(a, x) P(x)
a = x̂
L(x̂, x) = (x̂! x)2 x̂ =!
x
xP(x)
L(x̂, x) = |x̂! x|x̂!
x=!"P(x) =
12
L(x̂, x) =!
0 if x=x̂!1 otherwise
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
BAYESIAN DECISION THEORY(HOW TO MAKE POINT ESTIMATES)
8
state of the worldx1 x2 x3
action
a1 L(a1,x1) L(a1,x2) L(a1,x3)a2 L(a2,x1) L(a2,x2) L(a2,x3)a3 L(a3,x1) L(a3,x2) L(a3,x3)
...
...
action to choose:
note: a and x need not live in the same space
special cases when and x do live in the same space
posterior mean
posterior median
maximum a posteriori (MAP)
loss function
a! = argmina
!
x
L(a, x) P(x)
a = x̂
L(x̂, x) = (x̂! x)2 x̂ =!
x
xP(x)
L(x̂, x) = |x̂! x|x̂!
x=!"P(x) =
12
L(x̂, x) =!
0 if x=x̂!1 otherwise
x̂ = argmaxx
P(x)
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
EXAMPLE: PREDICTING LIFE SPAN
9
You meet someone who is t years old. What will be his total life span ttotal?
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
EXAMPLE: PREDICTING LIFE SPAN
9
You meet someone who is t years old. What will be his total life span ttotal?
P(ttotal|t) ! P(t|ttotal) P(ttotal)
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
EXAMPLE: PREDICTING LIFE SPAN
9
You meet someone who is t years old. What will be his total life span ttotal?
the probability that you meet someone
at the age of twhen s/he will have
a total life span of ttotal
P(ttotal|t) ! P(t|ttotal) P(ttotal)
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
EXAMPLE: PREDICTING LIFE SPAN
9
You meet someone who is t years old. What will be his total life span ttotal?
the probability that you meet someone
at the age of twhen s/he will have
a total life span of ttotal
P(ttotal|t) ! P(t|ttotal) P(ttotal)
!
!"
#
1P(ttotal)
if t < ttotal
0 otherwise
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
EXAMPLE: PREDICTING LIFE SPAN
9
You meet someone who is t years old. What will be his total life span ttotal?
the probability that you meet someone
at the age of twhen s/he will have
a total life span of ttotal
prior on life span distribution of people
P(ttotal|t) ! P(t|ttotal) P(ttotal)
!
!"
#
1P(ttotal)
if t < ttotal
0 otherwise
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
EXAMPLE: PREDICTING LIFE SPAN
9
You meet someone who is t years old. What will be his total life span ttotal?
the probability that you meet someone
at the age of twhen s/he will have
a total life span of ttotal
prior on life span distribution of people
P(ttotal|t) ! P(t|ttotal) P(ttotal)
!
!"
#
1P(ttotal)
if t < ttotal
0 otherwise
+ decision theory
e.g. report the median of the posterior
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
EXAMPLE: PREDICTING LIFE SPAN
9
You meet someone who is t years old. What will be his total life span ttotal?
the probability that you meet someone
at the age of twhen s/he will have
a total life span of ttotal
prior on life span distribution of people
P(ttotal|t) ! P(t|ttotal) P(ttotal)
!
!"
#
1P(ttotal)
if t < ttotal
0 otherwise
+ decision theory
e.g. report the median of the posterior
gous to the Copernican anthropic principle in Bayesian cos-
mology (Buch, 1994; Caves, 2000; Garrett & Coles, 1993; Gott,1993, 1994; Ledford, Marriott, & Crowder, 2001) and the ge-
neric-view principle in Bayesian models of visual perception(Freeman, 1994; Knill & Richards, 1996). The prior probability
p(ttotal) reflects our general expectations about the relevant classof events—in this case, about how likely it is that a man’s lifespan will be ttotal. Analysis of actuarial data shows that the
distribution of life spans in our society is (ignoring infant mor-tality) approximately Gaussian—normally distributed—with a
mean, m, of about 75 years and a standard deviation, s, of about16 years.
Combining the prior with the likelihood according to Equation1 yields a probability distribution p(ttotal|t) over all possible totallife spans ttotal for a man encountered at age t. A good guess for
ttotal is the median of this distribution—that is, the point at whichit is equally likely that the true life span is longer or shorter.
Taking the median of p(ttotal|t) defines a Bayesian predictionfunction, specifying a predicted value of ttotal for each observedvalue of t. Prediction functions for events with Gaussian priors
are nonlinear: For values of t much less than the mean of theprior, the predicted value of ttotal is approximately the mean;
once t approaches the mean, the predicted value of ttotal in-creases slowly, converging to t as t increases but always re-
maining slightly higher, as shown in Figure 1. Although itsmathematical form is complex, this prediction function makesintuitive sense for human life spans: A predicted life span of
about 75 years would be reasonable for aman encountered at age18, 39, or 51; if we met a man at age 75, we might be inclined to
give him several more years at least; but if wemet someone at age96, we probably would not expect him to live much longer.This approach to prediction is quite general, applicable to any
problem that requires estimating the upper limit of a duration,extent, or other numerical quantity given a sample drawn from
that interval (Buch, 1994; Caves, 2000; Garrett & Coles, 1993;Gott, 1993, 1994; Jaynes, 2003; Jeffreys, 1961; Ledford et al.,
2001; Leslie, 1996; Maddox, 1994; Shepard, 1987; Tenenbaum& Griffiths, 2001). However, different priors will be appropriatefor different kinds of phenomena, and the prediction function
will vary substantially as a result. For example, imagine trying topredict the total box-office gross of a movie given its take so far.
The total gross of movies follows a power-law distribution, withp(ttotal) / ttotal
!g for some g> 0.1 This distribution has a highly
non-Gaussian shape (see Fig. 1), with most movies taking in onlymodest amounts, but occasional blockbusters making hugeamounts of money. In the appendix, we show that for power-law
priors, the Bayesian prediction function picks a value for ttotalthat is a multiple of the observed sample t. The exact multipledepends on the parameter g. For the particular power law thatbest fits the actual distribution of movie grosses, an optimal
Bayesian observer would estimate the total gross to be approx-imately 50% greater than the current gross: Thus, if we observe amovie has made $40 million to date, we should guess a total
gross of around $60 million; if we observe a current gross of only$6 million, we should guess about $9 million for the total.
Although such constant-multiple prediction rules are optimalfor event classes that follow power-law priors, they are clearly
inappropriate for predicting life spans or other kinds of eventswith Gaussian priors. For instance, upon meeting a 10-year-oldgirl and her 75-year-old grandfather, we would never predict
that the girl will live a total of 15 years (1.5 " 10) and thegrandfather will live to be 112 (1.5" 75). Other classes of priors,
such as the exponential-tailed Erlang distribution, p(ttotal) /ttotalexp(!ttotal/b) for b> 0,2 are also associated with distinctiveoptimal prediction functions. For the Erlang distribution, the
Fig. 1. Bayesian prediction functions and their associated prior distri-butions. The three columns represent qualitatively different statisticalmodels appropriate for different kinds of events. The top row of plotsshows three parametric families of prior distributions for the total dura-tion or extent, ttotal, that could describe events in a particular class. Linesof different styles represent different parameter values (e.g., differentmean durations) within each family. The bottom row of plots shows theoptimal predictions for ttotal as a function of t, the observed duration orextent of an event so far, assuming the prior distributions shown in the toppanel. For Gaussian priors (left column), the prediction function alwayshas a slope less than 1 and an intercept near the mean m: Predictions arenever much smaller than the mean of the prior distribution, nor muchlarger than the observed duration. Power-law priors (middle column)result in linear prediction functions with variable slope and a zero inter-cept. Erlang priors (right column) yield a linear prediction function thatalways has a slope equal to 1 and a nonzero intercept.
1When g > 1, a power-law distribution is often referred to in statistics andeconomics as a Pareto distribution.
2The Erlang distribution is a special case of the gamma distribution. Thegamma distribution is p(ttotal) / ttotal
k!1exp(!ttotal/b), where k > 0 and b > 0are real numbers. The Erlang distribution assumes that k is an integer. FollowingShepard (1987), we use a one-parameter Erlang distribution, fixing k at 2.
768 Volume 17—Number 9
Everyday Predictions
Gri
ffit
hs &
Ten
enba
um,
2006
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
EVERYDAY PREDICTIONS
10
best guess of ttotal is simply t plus a constant determined bythe parameter b, as shown in the appendix and illustrated in
Figure 1.Our experiment compared these ideal Bayesian analyses with
the judgments of a large sample of human participants, exam-
ining whether people’s predictions were sensitive to the distri-butions of different quantities that arise in everyday contexts.
We used publicly available data to identify the true prior dis-tributions for several classes of events (the sources of these data
are given in Table 1). For example, as shown in Figure 2, humanlife spans and the run time of movies are approximatelyGaussian, the gross of movies and the length of poems are ap-
proximately power-law distributed, and the distributions of thenumber of years in office for members of the U.S. House of
Representatives and of the length of the reigns of pharaohs are
approximately Erlang. The experiment examined how wellpeople’s predictions corresponded to optimal statistical infer-
ence in these different settings.
METHOD
Participants and ProcedureParticipants were tested in two groups, with each group makingpredictions about five different phenomena. One group of 208undergraduates made predictions about movie grosses, poem
lengths, life spans, reigns of pharaohs, and lengths of marriages.A second group of 142 undergraduates made predictions about
movie run times, terms of U.S. representatives, baking times forcakes, waiting times, and lengths of marriages. The surveys were
TABLE 1
Sources of Data for Estimating Prior Distributions
Data set Source (number of data points)
Movie grosses http://www.worldwideboxoffice.com/ (5,302)Poem lengths http://www.emule.com/ (1,000)Life spans http://www.demog.berkeley.edu/wilmoth/mortality/states.html (complete life table)Movie run times http://www.imdb.com/charts/usboxarchive/ (233 top-10 movies from 1998 through 2003)U.S. representatives’ terms http://www.bioguide.congress.gov/ (2,150 members since 1945)Cake baking times http://www.allrecipes.com/ (619)Pharaohs’ reigns http://www.touregypt.com/ (126)
Note. Data were collected from these Web sites between July and December 2003.
Fig. 2.People’s predictions for various everyday phenomena.The top row of plots shows the empirical distributions of the total duration or extent, ttotal,for each of these phenomena. The first two distributions are approximately Gaussian, the third and fourth are approximately power-law, and the fifthand sixth are approximatelyErlang.The bottom row shows participants’ predicted values of ttotal for a single observed sample t of a duration or extent foreach phenomenon. Black dots show the participants’ median predictions of ttotal. Error bars indicate 68% confidence intervals (estimated by a 1,000-sample bootstrap). Solid lines show the optimal Bayesian predictions based on the empirical prior distributions shown above. Dashed lines show pre-dictions made by estimating a subjective prior, for the pharaohs and waiting-times stimuli, as explained in the main text. Dotted lines show predictionsbased on a fixed uninformative prior (Gott, 1993).
Volume 17—Number 9 769
Thomas L. Griffiths and Joshua B. Tenenbaum
Gri
ffit
hs &
Ten
enba
um,
2006
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
RATIONAL VS IRRATIONAL
11
Bernoulli (1713) Kahneman & Tversky2002 Nobel Prize
in Economics
John Andersonrational analysis
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
RATIONAL VS IRRATIONAL
11
Bernoulli (1713) Kahneman & Tversky2002 Nobel Prize
in Economics
• computational cost• ecology vs economy• certainty vs uncertainty• implicit vs explicit (esp verbal) computations
for more, see Anderson (1990)
John Andersonrational analysis
Probabilistic models of learning and memory — Uncertainty and Bayesian Inference http://www.eng.cam.ac.uk/~m.lengyelCEU, Budapest, 22-26 June 2009
Adelson (unpubl) http://web.mit.edu/persci/people/adelson/checkershadow_illusion.html
Anderson (1990) The adaptive character of thought. Lawrence Erlbaum Asociates, Hillsdale, NJ.
Bernoulli J (1713) Ars conjectandi. Thurnisiorum, Basel.
Griffiths TL, Tenenbaum, JB (2006) Optimal predictions in everyday cognition. Psychol Sci 17:767-773.
Helmholtz H (1867) Handbuch der physiologischen Optik. L. Voss, Leipzig. (translated in English by JPC Southall as “Treatise on Physiological Optics”)
Kahneman D, Tversky A (1973) On the psychology of predictions. Psychol Rev 80:237-251.
Roediger HL III, McDermott KB (1995) Creating false memories: Remembering words not presented in lists. J Exp Psychol Learn Mem Cogn 21:803-14.
12
REFERENCES
Recommended