91
Logistics • Class size? Who is new? Who is listening? • Everyone on Athena mailing list “concepts-and-theories”? If not write to me. • Everyone on stellar yet? If not, write to Melissa Yeh ([email protected]). • Interest in having a printed course pack, even if a few readings get changed?

Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Embed Size (px)

DESCRIPTION

Plan for tonight Why be Bayesian? Informal introduction to learning as probabilistic inference Formal introduction to probabilistic inference A little bit of mathematical psychology An introduction to Bayes nets

Citation preview

Page 1: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Logistics

• Class size? Who is new? Who is listening?• Everyone on Athena mailing list “concepts-

and-theories”? If not write to me.• Everyone on stellar yet? If not, write to

Melissa Yeh ([email protected]).• Interest in having a printed course pack,

even if a few readings get changed?

Page 2: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Plan for tonight

• Why be Bayesian?• Informal introduction to learning as

probabilistic inference• Formal introduction to probabilistic

inference• A little bit of mathematical psychology• An introduction to Bayes nets

Page 3: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Plan for tonight

• Why be Bayesian?• Informal introduction to learning as

probabilistic inference• Formal introduction to probabilistic

inference• A little bit of mathematical psychology• An introduction to Bayes nets

Page 4: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Virtues of Bayesian framework • Generates principled models with strong explanatory

and descriptive power.

Page 5: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Virtues of Bayesian framework • Generates principled models with strong explanatory

and descriptive power. • Unifies models of cognition across tasks and domains.

– Categorization – Concept learning– Word learning– Inductive reasoning– Causal inference– Conceptual change

– Biology– Physics– Psychology– Language– . . .

Page 6: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Virtues of Bayesian framework • Generates principled models with strong explanatory

and descriptive power. • Unifies models of cognition across tasks and domains. • Explains which processing models work, and why.

– Associative learning– Connectionist networks – Similarity to examples– Toolkit of simple heuristics

Page 7: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Virtues of Bayesian framework • Generates principled models with strong explanatory

and descriptive power. • Unifies models of cognition across tasks and domains. • Explains which processing models work, and why. • Allows us to move beyond classic dichotomies.

– Symbols (rules, logic, hierarchies, relations) versus Statistics– Domain-general versus Domain-specific– Nature versus Nurture

Page 8: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Virtues of Bayesian framework • Generates principled models with strong explanatory and

descriptive power. • Unifies models of cognition across tasks and domains. • Explains which processing models work, and why. • Allows us to move beyond classic dichotomies.• A framework for understanding theory-based cognition:

– How are theories used to learn about the structure of the world?– How are theories acquired?

Page 9: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

• Fundamental question: How do we update beliefs in light of data?

• Fundamental (and only) assumption:

Represent degrees of belief as probabilities.

• The answer:Mathematics of probability theory.

Rational statistical inference(Bayes, Laplace)

Page 10: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Frequentists: Probability as expected frequency• P(A) = 1: A will always occur. • P(A) = 0: A will never occur. • 0.5 < P(A) < 1: A will occur more often than not.

Subjectivists: Probability as degree of belief• P(A) = 1: believe A is true. • P(A) = 0: believe A is false.• 0.5 < P(A) < 1: believe A is more likely to be true

than false.

What does probability mean?

Page 11: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Frequentists: Probability as expected frequency• P(“heads”) = 0.5 ~ “If we flip 100 times, we expect

to see about 50 heads.”

Subjectivists: Probability as degree of belief• P(“heads”) = 0.5 ~ “On the next flip, it’s an even

bet whether it comes up heads or tails.”• P(“rain tomorrow”) = 0.8• P(“Saddam Hussein is dead”) = 0.1• . . .

What does probability mean?

Page 12: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Is subjective probability cognitively viable?

• Evolutionary psychologists (Gigerenzer, Cosmides, Tooby, Pinker) argue it is not.

Page 13: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

“To understand the design of statistical inference mechanisms, then, one needs to examine what form inductive-reasoning problems -- and the information relevant to solving them -- regularly took in ancestral environments. […] Asking for the probability of a single event seems unexceptionable in the modern world, where we are bombarded with numerically expressed statistical information, such as weather forecasts telling us there is a 60% chance of rain today. […] In ancestral environments, the only external database available from which to reason inductively was one's own observations and, possibly, those communicated by the handful of other individuals with whom one lived. The ‘probability’ of a single event cannot be observed by an individual, however. Single events either happen or they don’t -- either it will rain today or it will not. Natural selection cannot build cognitive mechanisms designed to reason about, or receive as input, information in a format that did not regularly exist.”

(Brase, Cosmides and Tooby, 1998)

Page 14: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Is subjective probability cognitively viable?

• Evolutionary psychologists (Gigerenzer, Cosmides, Tooby, Pinker) argue it is not.

• Reasons to think it is:– Intuitions are old and potentially universal

(Aristotle, the Talmud).– Represented in semantics (and syntax?) of

natural language.– Extremely useful ….

Page 15: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Why be subjectivist?• Often need to make inferences about singular events

– e.g., How likely is it to rain tomorrow?

• Cox Axioms– A formal model of common sense

• “Dutch Book” + Survival of the Fittest– If your beliefs do not accord with the laws of probability, then you

can always be out-gambled by someone whose beliefs do so accord.

• Provides a theory of learning– A common currency for combining prior knowledge and the lessons

of experience.

Page 16: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Cox Axioms (via Jaynes)• Degrees of belief are represented by real numbers.• Qualitative correspondence with common sense, e.g.:

• Consistency:– If a conclusion can be reasoned in more than one way, then every

possible way must lead to the same result.– All available evidence should be taken into account when inferring

a degree of belief.– Equivalent states of knowledge should be represented with

equivalent degrees of belief.• Accepting these axioms implies Bel can be represented as a

probability measure.

)]([)( ABelfABel

)]|(),([)( ABBelABelgBABel

Page 17: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Plan for tonight

• Why be Bayesian?• Informal introduction to learning as

probabilistic inference• Formal introduction to probabilistic

inference• A little bit of mathematical psychology• An introduction to Bayes nets

Page 18: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Example: flipping coins• Flip a coin 10 times and see 5 heads, 5 tails. • P(heads) on next flip? 50%• Why? 50% = 5 / (5+5) = 5/10.• “Future will be like the past.”

• Suppose we had seen 4 heads and 6 tails.• P(heads) on next flip? Closer to 50% than to 40%.• Why? Prior knowledge.

Page 19: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Example: flipping coins• Represent prior knowledge as fictional observations F. • E.g., F ={1000 heads, 1000 tails} ~ strong expectation

that any new coin will be fair.• After seeing 4 heads, 6 tails, P(heads) on next flip =

1004 / (1004+1006) = 49.95%

• E.g., F ={3 heads, 3 tails} ~ weak expectation that any new coin will be fair.

• After seeing 4 heads, 6 tails, P(heads) on next flip = 7 / (7+9) = 43.75%. Prior knowledge too weak.

Page 20: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Example: flipping thumbtacks• Represent prior knowledge as fictional observations F. • E.g., F ={4 heads, 3 tails} ~ weak expectation that tacks

are slightly biased towards heads.• After seeing 2 heads, 0 tails, P(heads) on next flip = 6 /

(6+3) = 67%.

• Some prior knowledge is always necessary to avoid jumping to hasty conclusions.

• Suppose F = { }: After seeing 2 heads, 0 tails, P(heads) on next flip = 2 / (2+0) = 100%.

Page 21: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Origin of prior knowledge• Tempting answer: prior experience• Suppose you have previously seen 2000 coin flips:

1000 heads, 1000 tails. • By assuming all coins (and flips) are alike, these

observations of other coins are as good as actual observations of the present coin.

Page 22: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Problems with simple empiricism

• Haven’t really seen 2000 coin flips, or any thumbtack flips. – Prior knowledge is stronger than raw experience justifies.

• Haven’t seen exactly equal number of heads and tails.– Prior knowledge is smoother than raw experience justifies.

• Should be a difference between observing 2000 flips of a single coin versus observing 10 flips each for 200 coins, or 1 flip each for 2000 coins.– Prior knowledge is more structured than raw experience.

Page 23: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

A simple theory• “Coins are manufactured by a standardized procedure

that is effective but not perfect.” – Justifies generalizing from previous coins to the present coin.– Justifies smoother and stronger prior than raw experience

alone. – Explains why seeing 10 flips each for 200 coins is more

valuable than seeing 2000 flips of one coin.• “Tacks are asymmetric, and manufactured to less

exacting standards.”

Page 24: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Limitations

• Can all domain knowledge be represented so simply, in terms of an equivalent number of fictional observations?

• Suppose you flip a coin 25 times and get all heads. Something funny is going on ….

• But with F ={1000 heads, 1000 tails}, P(heads) on next flip = 1025 / (1025+1000) = 50.6%. Looks like nothing unusual.

Page 25: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Plan for tonight

• Why be Bayesian?• Informal introduction to learning as

probabilistic inference• Formal introduction to probabilistic

inference• A little bit of mathematical psychology• An introduction to Bayes nets

Page 26: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Basics

• Propositions: A, B, C, . . . .• Negation: • Logical operators “and”, “or”: • Obey classical logic, e.g.,

A

)( BABA

BABA ,

Page 27: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Basics

• Conservation of belief:• “Joint probability”:• For independent propositions:

• More generally:

1)()( APAP

),( written also ),( BAPBAP

)()(),( BPAPBAP

)|()()|()(),( BAPBPABPAPBAP

“Conditional probability”

Page 28: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Basics

• Example:– A = “Heads on flip 2”– B = “Tails on flip 2”

41

21

21)()(),( BPAPBAP

0021)|()(),( ABPAPBAP

Page 29: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Basics

• All probabilities should be conditioned on background knowledge K: e.g.,

• All the same rules hold conditioned on any K: e.g.,

• Often background knowledge will be implicit, brought in as needed.

)|( KAP

),|()|()|,( KABPKAPKBAP

Page 30: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Bayesian inference

• Definition of conditional probability:

• Bayes’ theorem:

)|()()|()(),( BAPBPABPAPBAP

)()|()()|(

APBAPBPABP

Page 31: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Bayesian inference

• Definition of conditional probability:

• Bayes’ rule:

• “Posterior probability”:• “Prior probability”:• “Likelihood”:

)|()()|()(),( BAPBPABPAPBAP

)()|()()|(

DPHDPHPDHP

)|( DHP

)(HP

)|( HDP

Page 32: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Bayesian inference

• Bayes’ rule:

• What makes a good scientific argument? P(H|D) is high if:– Hypothesis is plausible: P(H) is high – Hypothesis strongly predicts the observed data: P(D|H) is high– Data are surprising: P(D) is low

)()|()()|(

DPHDPHPDHP

Page 33: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Bayesian inference

• Deriving a more useful version:

)()|()()|(

APBAPBPABP

)()|()()|(

APBAPBPABP

1)|()|( ABPABP

Page 34: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Bayesian inference

• Deriving a more useful version:

)()|()()|(

APBAPBPABP

)()|()()|(

APBAPBPABP

1)(

)|()()(

)|()(

APBAPBP

APBAPBP

Page 35: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Bayesian inference

• Deriving a more useful version:

)()|()()|(

APBAPBPABP

)()|()()|(

APBAPBPABP

)()|()()|()( APBAPBPBAPBP “Conditionalization”

)(),(),( APBAPBAP “Marginalization”

Page 36: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Bayesian inference

• Deriving a more useful version:

)()|()()|(

APBAPBPABP

)()|()()|(

APBAPBPABP

)()|()()|()( APBAPBPBAPBP

Page 37: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Bayesian inference

• Deriving a more useful version:

)()|()()|(

APBAPBPABP

)()|()()|()( APBAPBPBAPBP

Page 38: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Bayesian inference

• Deriving a more useful version:

)|()()|()()|()()|(

BAPBPBAPBPBAPBPABP

Page 39: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Bayesian inference

• Deriving a more useful version:

)|()()|()()|()()|(

HDPHPHDPHPHDPHPDHP

Page 40: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Random variables

• Random variable X denotes a set of mutually exclusive exhaustive propositions (states of the world):

• Bayes’ theorem for random variables:

},,{ 1 nxxX

1)( ii

xXP

)|()()|()()|(

ii

i yYxXPyYPyYxXPyYPxXyYP

Page 41: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Random variables

• Random variable X denotes a set of mutually exclusive exhaustive propositions (states of the world):

• Bayes’ rule for more than two hypotheses:

},,{ 1 nxxX

1)( ii

xXP

)|()()|()()|(

ii

i hHdDPhHPhHdDPhHPdDhHP

Page 42: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Sherlock Holmes

• “How often have I said to you that when you have eliminated the impossible whatever remains, however improbable, must be the truth?” (The Sign of the Four)

)|()()|()()|(

ii

i hHdDPhHPhHdDPhHPdDhHP

Page 43: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Sherlock Holmes

• “How often have I said to you that when you have eliminated the impossible whatever remains, however improbable, must be the truth?” (The Sign of the Four)

)|()()|()()|()()|(

ihh

i hdPhPhdPhPhdPhPdhP

i

Page 44: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Sherlock Holmes

• “How often have I said to you that when you have eliminated the impossible whatever remains, however improbable, must be the truth?” (The Sign of the Four)

)|()()|()()|()()|(

ihh

i hdPhPhdPhPhdPhPdhP

i

= 0

Page 45: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Sherlock Holmes

• “How often have I said to you that when you have eliminated the impossible whatever remains, however improbable, must be the truth?” (The Sign of the Four)

1)|()()|()()|(

hdPhPhdPhPdhP

Page 46: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Plan for tonight

• Why be Bayesian?• Informal introduction to learning as

probabilistic inference• Formal introduction to probabilistic

inference• A little bit of mathematical psychology• An introduction to Bayes nets

Page 47: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Representativeness in reasoning

Which sequence is more likely to be produced by flipping a fair coin?

HHTHT

HHHHH

Page 48: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

A reasoning fallacy

Kahneman & Tversky: people judge the probability of an outcome based on the extent to which it is representative of the generating process.

But how does “representativeness” work?

Page 49: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Predictive versus inductive reasoning

H

D

Hypothesis

Data

Page 50: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Predictiongiven

?

H

D

Likelihood: )|( HDP

Predictive versus inductive reasoning

Page 51: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Induction?

given

Predictiongiven

?

H

D

Representativeness:)|()|(

2

1HDPHDP

Predictive versus inductive reasoning

Likelihood: )|( HDP

P(H1|D) P(D|H1) P(H1)

P(H2|D) P(D|H2) P(H2) = x

Page 52: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Bayes’ Rule in odds form

P(H1|D) P(D|H1) P(H1)

P(H2|D) P(D|H2) P(H2)

D: dataH1, H2: models

P(H1|D): posterior probability that model 1 generated the data.

P(D|H1): likelihood of data given model 1

P(H1): prior probability that model 1 generated the data

= x

Page 53: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

D: HHTHTH1, H2: fair coin, trick “all heads” coin.

P(D|H1) = 1/32 P(H1) = 999/1000

P(D|H2) = 0 P(H2) = 1/1000

P(H1|D) / P(H2|D) = infinity

P(H1|D) P(D|H1) P(H1)

P(H2|D) P(D|H2) P(H2)

Bayesian analysis of coin flipping

= x

Page 54: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

D: HHHHHH1, H2: fair coin, trick “all heads” coin.

P(D|H1) = 1/32 P(H1) = 999/1000

P(D|H2) = 1 P(H2) = 1/1000

P(H1|D) / P(H2|D) = 999/32 ~ 30:1

P(H1|D) P(D|H1) P(H1)

P(H2|D) P(D|H2) P(H2)

Bayesian analysis of coin flipping

= x

Page 55: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

D: HHHHHHHHHHH1, H2: fair coin, trick “all heads” coin.

P(D|H1) = 1/1024 P(H1) = 999/1000

P(D|H2) = 1 P(H2) = 1/1000

P(H1|D) / P(H2|D) = 999/1024 ~ 1:1

P(H1|D) P(D|H1) P(H1)

P(H2|D) P(D|H2) P(H2)

Bayesian analysis of coin flipping

= x

Page 56: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

The role of theories

The fact that HHTHT looks representative of a fair coin and HHHHH does not reflects our implicit theories of how the world works. – Easy to imagine how a trick all-heads coin

could work: high prior probability.– Hard to imagine how a trick “HHTHT” coin

could work: low prior probability.

Page 57: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Plan for tonight

• Why be Bayesian?• Informal introduction to learning as

probabilistic inference• Formal introduction to probabilistic

inference• A little bit of mathematical psychology• An introduction to Bayes nets

Page 58: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

• Three binary variables: Cavity, Toothache, Catch (whether dentist’s probe catches in your tooth).

Scaling up

)()|()()|()()|(

)|(

cavityPcavcatchachePcavPcavcatchachePcavPcavcatchacheP

catchachecavP

Page 59: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

• Three binary variables: Cavity, Toothache, Catch (whether dentist’s probe catches in your tooth).

• With n pieces of evidence, we need 2n+1 conditional probabilities.

• Here n=2. Realistically, many more: X-ray, diet, oral hygiene, personality, . . . .

Scaling up

)()|()()|()()|(

)|(

cavityPcavcatchachePcavPcavcatchachePcavPcavcatchacheP

catchachecavP

Page 60: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

• All three variables are dependent, but Toothache and Catch are independent given the presence or absence of Cavity.

• Both Toothache and Catch are caused by Cavity, but via independent causal mechanisms.

• In probabilistic terms:

• With n pieces of evidence, x1, …, xn, we need 2 n conditional probabilities:

Conditional independence

)|()|()|( cavcatchPcavachePcavcatchacheP

)|()|()|( cavcatchPcavachePcavcatchacheP )|()|(1 cavcatchPcavacheP

)|(),|( cavxPcavxP ii

Page 61: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

• Graphical representation of relations between a set of random variables:

• Causal interpretation: independent local mechanisms• Probabilistic interpretation: factorizing complex terms

A simple Bayes net

Cavity

Toothache Catch

)()|,(),,( CavPCavCatchAchePCavCatchAcheP )()|()|( CavPCavCatchPCavAcheP

},,{

])[parents|(),,(CBAV

VVPCBAP

Page 62: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

• Joint distribution sufficient for any inference:

A more complex systemBattery

Radio Ignition Gas

Starts

On time to work

)|(),|()()|()|()(),,,,,( SOPGISPGPBIPBRPBPOSGIRBP

)(

)|(),|()()|()|()(

)(),()|( ,,,

GP

SOPGISPGPBIPBRPBP

GPGOPGOP SIRB

Page 63: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

• Joint distribution sufficient for any inference:

A more complex systemBattery

Radio Ignition Gas

Starts

On time to work

)|(),|()()|()|()(),,,,,( SOPGISPGPBIPBRPBPOSGIRBP

S IBSOPGISPBIPBP

GPGOPGOP )|(),|()|()()(

),()|(,

Page 64: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

• Joint distribution sufficient for any inference:

• General inference algorithm: local message passing

A more complex systemBattery

Radio Ignition Gas

Starts

On time to work

)|(),|()()|()|()(),,,,,( SOPGISPGPBIPBRPBPOSGIRBP

Page 65: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

• Assume grass will be wet if and only if it rained last night, or if the sprinklers were left on:

Explaining awayRain Sprinkler

Grass Wet

.andif0 sSrR

),|()()(),,( RSWPSPRPWSRP

rRsSRSwWP orif1),|(

Page 66: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Explaining awayRain Sprinkler

Grass Wet

)()()|()|(

wPrPrwPwrP

Compute probability it rained last night, given that the grass is wet:

.andif0 sSrR

),|()()(),,( RSWPSPRPWSRP

rRsSRSwWP orif1),|(

Page 67: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Explaining awayRain Sprinkler

Grass Wet

srsrPsrwP

rPrwPwrP

,),(),|(

)()|()|(Compute probability it rained last night, given that the grass is wet:

.andif0 sSrR

),|()()(),,( RSWPSPRPWSRP

rRsSRSwWP orif1),|(

Page 68: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Explaining awayRain Sprinkler

Grass Wet

),(),(),()()|(

srPsrPsrPrPwrP

Compute probability it rained last night, given that the grass is wet:

.andif0 sSrR

),|()()(),,( RSWPSPRPWSRP

rRsSRSwWP orif1),|(

Page 69: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Explaining awayRain Sprinkler

Grass Wet

Compute probability it rained last night, given that the grass is wet:

),()()()|(

srPrPrPwrP

.andif0 sSrR

),|()()(),,( RSWPSPRPWSRP

rRsSRSwWP orif1),|(

Page 70: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Explaining awayRain Sprinkler

Grass Wet

)()()()()|(

sPrPrPrPwrP

Compute probability it rained last night, given that the grass is wet:

Between 1 and P(s)

)(rP

.andif0 sSrR

),|()()(),,( RSWPSPRPWSRP

rRsSRSwWP orif1),|(

Page 71: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Explaining awayRain Sprinkler

Grass Wet

Compute probability it rained last night, given that the grass is wet and sprinklers were left on: )|(

)|(),|(),|(swP

srPsrwPswrP

Both terms = 1

.andif0 sSrR

),|()()(),,( RSWPSPRPWSRP

rRsSRSwWP orif1),|(

Page 72: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Explaining awayRain Sprinkler

Grass Wet

Compute probability it rained last night, given that the grass is wet and sprinklers were left on:

)(rP)|(),|( srPswrP

.andif0 sSrR

),|()()(),,( RSWPSPRPWSRP

rRsSRSwWP orif1),|(

Page 73: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Explaining awayRain Sprinkler

Grass Wet

)(rP)|(),|( srPswrP )()()(

)()|(sPrPrP

rPwrP

)(rP

“Discounting” to prior probability.

.andif0 sSrR

),|()()(),,( RSWPSPRPWSRP

rRsSRSwWP orif1),|(

Page 74: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

• Observing rain, Wet becomes more active. • Observing grass wet, Rain and Sprinkler become more active.• Observing grass wet and sprinkler, Rain cannot become less active. No explaining away!

• Excitatory links: Rain Wet, Sprinkler Wet

Contrast w/ spreading activationRain Sprinkler

Grass Wet

Page 75: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

• Observing grass wet, Rain and Sprinkler become more active.• Observing grass wet and sprinkler, Rain becomes less active: explaining away.

• Excitatory links: Rain Wet, Sprinkler Wet• Inhibitory link: Rain Sprinkler

Contrast w/ spreading activationRain Sprinkler

Grass Wet

Page 76: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

• Each new variable requires more inhibitory connections.• Interactions between variables are not causal.• Not modular.

– Whether a connection exists depends on what other connections exist, in non-transparent ways. – Big holism problem. – Combinatorial explosion.

Contrast w/ spreading activationRain

Sprinkler

Grass Wet

Burst pipe

Page 77: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

• Markov property: Any variable is conditionally independent of its non-descendants, given its parents.

• Example:

Causality and the Markov property

Cavity

Ache Catch

},,{

])[parents|(),,(CBAV

VVPCBAP

)(),,()|,(

CavPCavCatchAchePCavCatchAcheP

Page 78: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

• Markov property: Any variable is conditionally independent of its non-descendants, given its parents.

• Example:

Causality and the Markov property

Cavity

Ache Catch

},,{

])[parents|(),,(CBAV

VVPCBAP

)()()|()|()|,(

CavPCavPCavCatchPCavAchePCavCatchAcheP

Page 79: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

• Markov property: Any variable is conditionally independent of its non-descendants, given its parents.

• Example:

Causality and the Markov property

Cavity

Ache Catch

},,{

])[parents|(),,(CBAV

VVPCBAP

)|()|()|,( CavCatchPCavAchePCavCatchAcheP

Page 80: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

• Markov property: Any variable is conditionally independent of its non-descendants, given its parents.

• Example:

Causality and the Markov property

},,{

])[parents|(),,(CBAV

VVPCBAP

Rain Sprinkler

Grass Wet

Wet

WetSprinklerRainPSprinklerRainP ),,(),(

Page 81: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

• Markov property: Any variable is conditionally independent of its non-descendants, given its parents.

• Example:

Causality and the Markov property

},,{

])[parents|(),,(CBAV

VVPCBAP

Rain Sprinkler

Grass Wet

Wet

SprinklerPRainPSprinklerRainWetPSprinklerRainP )()(),|(),(

=1, for any values of Rain and Sprinkler

Page 82: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

• Markov property: Any variable is conditionally independent of its non-descendants, given its parents.

• Example:

Causality and the Markov property

},,{

])[parents|(),,(CBAV

VVPCBAP

Rain Sprinkler

Grass Wet

)()(),( SprinklerPRainPSprinklerRainP

Page 83: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

• Markov property: Any variable is conditionally independent of its non-descendants, given its parents.

• Suppose we get the direction of causality wrong, thinking that “symptoms” causes “diseases”:

• Does not capture the correlation between symptoms: falsely believe P(Ache, Catch) = P(Ache) P(Catch).

Causality and the Markov property

Ache Catch

Cavity

Page 84: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

• Markov property: Any variable is conditionally independent of its non-descendants, given its parents.

• Suppose we get the direction of causality wrong, thinking that “symptoms” causes “diseases”:

• Inserting a new arrow allows us to capture this correlation.• This model is too complex: do not believe that

Causality and the Markov property

Ache Catch

Cavity

)|()|()|,( CavCatchPCavAchePCavCatchAcheP

Page 85: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

• Markov property: Any variable is conditionally independent of its non-descendants, given its parents.• Suppose we get the direction of causality wrong, thinking that “symptoms” causes “diseases”:

• New symptoms require a combinatorial proliferation of new arrows. Too general, not modular, holism, yuck . . . .

Causality and the Markov property

Ache Catch

Cavity

X-ray

Page 86: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Still to come• Applications to models of categorization• More on the relation between causality and probability:

• Learning causal graph structures.• Learning causal abstractions (“diseases cause

symptoms”)• What’s missing

Causal structure

Statistical dependencies

Page 87: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

The end

Page 88: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Mathcamp data: raw

Page 89: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Mathcamp data: collapsed over parity

Page 90: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?

Zenith radio data: collapsed over parity

Page 91: Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list “concepts- and-theories”? If not write to me. Everyone on stellar yet?