Logic and Probability - Stanford Universityweb.stanford.edu/~icard/esslli2014/lecture1.pdf · 2014-08-15 · Some questions and points of contact: IIn what ways might probability

Logic and ProbabilityLecture 1: Probability as Logic

Wesley Holliday & Thomas IcardUC Berkeley & Stanford

August 11, 2014ESSLLI, Tubingen

Wesley Holliday & Thomas Icard: Logic and Probability, Lecture 1: Probability as Logic 1

Overview

Logic as a theory of:

I truth-preserving inference

I consistency

I definability

I proof / deduction

I rationality

. . .

Probability as a theory of:

I uncertain inference

I learning

I information

I induction

I rationality

. . .


Some questions and points of contact:

I In what ways might probability be said to extend logic?

I How do probability and various logical systems differ on what theysay about rational inference?

I Can logic be used to gain a better understanding of probability? Say,through standard techniques of formalization, definability, questionsof completeness, complexity, etc.?

I Can the respective advantages of logical representation andprobabilistic learning and inference be combined to create morepowerful reasoning systems?


Course Outline

I Day 1: Probability as (Extended) Logic

I Day 2: Probability, Nonmonotonicity, and Graphical Models

I Day 3: Beyond Boolean Logic

I Day 4: Qualitative Probability

I Day 5: Logical Dynamics


Interpretations of Probability

I Frequentist: Probabilities are about ‘limiting frequencies’ of events.

I Propensity: Probabilities are about physical dispositions, orpropensities, of events.

I Logical: Probabilities are determined objectively via a logicallanguage and some additional principles, e.g., of ‘symmetry’.

I Bayesian: Probabilities are subjective and reflect an agent’s degreeof confidence concerning some event.

...

We will mostly remain neutral, focusing on mathematical questions, butwill note when the interpretation is critical.


A measurable space is a pair (W , E) with

I W is an arbitrary set

I E is a σ-algebra over W , i.e., a subset of ℘(W ) closed undercomplement and infinite union.

A probability space is a triple (W , E , µ), with (W , E) a measurable spaceand µ : E → [0, 1] a measure function satisfying:

I µ(W ) = 1 ;

I µ(E ∪ F ) = µ(E ) + µ(F ), whenever E ∩ F = ∅ .

A random variable from (W , E , µ) to some other space (V ,F ) is afunction X : W → V such that:

whenever X (E ) = F and F ∈ F , we have E ∈ E .

Then µ induces a natural distribution for X , typically written

P(X = v) = µ({w ∈ W : X (w) = v}) .


Suppose we have a propositional logical language L generated as follows

ϕ ::= A | B | . . . | ϕ ∧ ϕ | ¬ϕ

We can define a probability P : L → [0, 1] directly on L, requiring

I P(ϕ) = 1, for any tautology ϕ ;

I P(ϕ ∨ ψ) = P(ϕ) + P(ψ), whenever � ¬(ϕ ∧ ψ) .

Equivalent set of requirements:

I P(ϕ) = 1 for any tautology ;

I P(ϕ) ≤ P(ψ) whenever � ϕ→ ψ ;

I P(ϕ) = P(ϕ ∧ ψ) + P(ϕ ∧ ¬ψ) .


It is then easy to show:

I P(ϕ) = 0, for any contradiction ϕ ;

I P(¬ϕ) = 1− P(ϕ) ;

I P(ϕ ∨ ψ) = P(ϕ) + P(ψ)− P(ϕ ∧ ψ) ;

I A propositional valuation sending atoms to 1 or 0 is a special case ofa probability function ;

I A probability on L gives rise to a standard probability measure over‘world-states’, i.e., maximally consistent sets of formulas from L .


With this setup we can easily define, e.g., conditional probability:

P(ϕ|ψ) = P(ϕ ∧ ψ)

P(ψ),

provided P(ψ) > 0. With this it is easy to prove Bayes’ Theorem:

Theorem (Bayes)

P(ϕ|ψ) = P(ψ|ϕ)P(ϕ)

P(ψ).


Today’s Theme: Probability as Logic

Three senses in which probability has been claimed as part of logic:

1. Logical Deduction and Probability Preservation

2. Consistency and Fair Odds

3. Patterns of Plausible Reasoning


PART I:Logical Deduction and Probability Preservation


Logical Deduction and Probability Preservation

Suppose we can show Γ ` ϕ in propositional logic, but the assumptionsγ ∈ Γ themselves are uncertain.

Knowing about the probabilities of formulas in Γ, what can we concludeabout the probability of ϕ?

Example (Adams & Levine)A telephone survey is made of 1,000 people as to their political partyaffiliation. Supposing 629 people report being Democrat, with somepossibility of error in each case. What can we say about the probabilityof the following statement?

At least 600 people are Democrats.



Proposition (Suppes 1966)If Γ ` ϕ, then for all P we have P(¬ϕ) ≤ ∑γ∈Γ P(¬γ).



Example (Kyburg’s 1965 ‘Lottery Paradox’)Imagine a fair lottery with n tickets. Let γ1, . . . , γn be sentences:

γi : ticket i is the winner.

Each sentence ¬γi might seem reasonable, since it has probability n−1n .

However, putting these all together rapidly decreases probability. Indeed,

P( ∧i≤n¬γi

)= 0 .

This demonstrates that the (logical) conclusion of all these individuallyreasonable premises—that none of the tickets will win the lottery—hasprobability 0.



Definition (Suppes Entailment)Γ �s ϕ, iff for all P:

P(¬ϕ) ≤ ∑γ∈Γ

P(¬γ) .

Definition (Adams Entailment)Γ �a ϕ, iff for all ε > 0, there is a δ > 0, such that for all P:

if P(¬γ) < δ for all γ ∈ Γ, then P(¬ϕ) < ε .

LemmaΓ �s ϕ, iff Γ �a ϕ.



Theorem (Adams 1966)Classical logic is sound and complete for Suppes/Adams entailment.



The following are equivalent:

(a) Γ ` ϕ (b) Γ � ϕ (c) Γ �s ϕ (d) Γ �a ϕ .

Proof.

(a) ⇔ (b) is just standard completeness.




(a) Γ ` ϕ (b) Γ � ϕ (c) Γ �s ϕ (d) Γ �a ϕ .

Proof.

For (b) ⇒ (c), suppose Γ 2s ϕ, so there is a P, such that

1− P(ϕ) > ∑γ∈Γ

P(¬γ) .

From this it follows that

1 > ∑γ∈Γ

P(¬γ) + P(ϕ) .




(a) Γ ` ϕ (b) Γ � ϕ (c) Γ �s ϕ (d) Γ �a ϕ .

Proof.

For (b) ⇒ (c), suppose Γ 2s ϕ, so there is a P, such that

1− P(ϕ) > ∑γ∈Γ

P(¬γ) .

From this it follows that

1 > ∑γ∈Γ

P(¬γ) + P(ϕ) ≥ P(γ1 ∧ · · · ∧ γn → ϕ) ,

for any subset {γ1, . . . , γn} ⊆ Γ. Since all tautologies must haveprobability 1, no such implication γ1 ∧ · · · ∧ γn → ϕ is a tautology.




(a) Γ ` ϕ (b) Γ � ϕ (c) Γ �s ϕ (d) Γ �a ϕ .

Proof.

For (c) ⇒ (d), suppose Γ �s ϕ, and ε > 0. If Γ is finite, choose

δ =ε

|Γ| .

Then we have

P(¬ϕ) ≤ ∑γ

P(¬γ) < ∑γ

δ = ε .

Hence P(¬ϕ) < ε, and Γ �a ϕ. The case of Γ infinite is left as exercise.




(a) Γ ` ϕ (b) Γ � ϕ (c) Γ �s ϕ (d) Γ �a ϕ .

Proof.

Finally, for (d) ⇒ (b), suppose Γ 2 ϕ. For any ε > 0, we can simplytake a valuation witnessing Γ 2 ϕ, which is itself a probability function Pmaking P(ϕ) = 0 and P(γ) = 1, for all γ ∈ Γ. Hence Γ 2a ϕ.

(Note that essentially the same proof works to show (c) ⇒ (b).)

Thus we have shown (a) ⇔ (b) ⇒ (c) ⇒ (d) ⇒ (b). a



ExampleIn the telephone survey example, suppose each person’s statement hasprobability 0.999. Then the best upper bound we can get on probabilityfor the negation of the conclusion is

629 ∗ 0.001 = 0.629 .

That is, our best lower bound on the probability that there are at least600 Democrats is 0.371. This seems weak.



DefinitionWhere Γ � ϕ, we say Γ′ ⊆ Γ is essential if Γ− Γ′ 2 ϕ.

DefinitionGiven Γ � ϕ, the degree of essentialness of γ ∈ Γ is given by

E (γ) =1

|Sγ|,

where |Sγ| is the size of the smallest essential set containing γ.

Theorem (Adams & Levine 1975)If Γ ` ϕ, then

P(¬ϕ) ≤ ∑γ∈Γ

E (γ)P(¬γ) .



ExampleBack to the telephone survey example, we can now obtain a much betterlower bound on what seems to be a strong conclusion. The upper boundon the negation of the conclusion is now

629 ∗ 0.001 ∗ 1

30≈ 0.02 ,

making the lower bound on probability just below 0.98.


Part II:Consistency and Fair Odds


Consistency and Fair Odds

We strive to make judgments asdispassionate, reflective, and wise aspossible by a doctrine that showswhere and how they intervene and laysbare possible inconsistencies betweenjudgments. There is an instructiveanalogy between [deductive] logic,which convinces one that acceptanceof some opinions as ‘certain’ entailsthe certainty of others, and the theoryof subjective probabilities, whichsimilarly connects uncertain opinions.

–Bruno de Finetti, 1974



de Finetti’s central idea:

Probabilities as chance-based odds

Example (Shakespeare, via Girotto & Gonzalez; Howson)

Speed: Sir Proteus, save you! Saw you my master?

Proteus: But now he parted hence, to embark for Milan.

Speed: Twenty to one, then, he is shipp’d already.

(Two Gentlemen of Verona, 1592)

Gloss: A bet that collects Y if he is gone, and pays X if not, will be fair,provided X : Y = 20 : 1. These are fair betting odds.

The normalized betting odds, 2021 and 1

21 , are called betting quotients.



I S : stake

I 1ϕ: indicator function (= 1 if ϕ holds, 0 otherwise)

I p: betting quotient for ϕ

A bet on ϕ at (positive) stake S :

I Pay S ∗ pI Receive S if ϕ is true, nothing otherwise

A bet against ϕ at (negative) stake S :

I Receive S ∗ pI Pay S is ϕ is true, nothing otherwise

In both cases the value of the gamble is given by:

S(1ϕ − p)



The probability calculus is about probabilities of logically complexpropositions. What can we say about fair betting odds in this setting?

de Finetti’s critical assumption (cf. Skyrms 1975, Howson 2007, etc.):

(A) Fair gambles do not become collectively unfair on collection into ajoint gamble.

Assumption (A)—a kind of agglomerativity assumption—will allow us toreason about probabilities, by reasoning about which gambles are fair onthe basis of the assumed fairness of other gambles.



Proposition

1. If p is fair betting quotient for ϕ, then 1− p is fair for ¬ϕ.

2. If � ¬(ϕ ∧ ψ), with fair betting quotients p and q on ϕ and ψ,respectively, then p + q is fair betting quotient for ϕ ∨ ψ.

Proof.

For 1., note that

S(1ϕ − p) = −S [1¬ϕ − (1− p)] .

For 2., note that

S(1ϕ − p) + S(1ψ − q) = S [1ϕ∨ψ − (p + q)] .

We now invoke principle (A). a



Fundamental idea:

Take fair betting quotients p for formulas ϕ to be probabilities P(ϕ).

The axioms of probability are thus axioms of consistency.



Dutch Book Theorem

If we conceive of principle (A) as telling us what combinations of bets aperson ought to accept, then we can prove:

Theorem (de Finetti 1937)Someone with judgments P : L → [0, 1] is subject to a sure loss, iff P isinconsistent with the probability axioms.



Dutch Book Theorem

ExampleSuppose � ¬(ϕ ∧ ψ), and P(ϕ ∨ ψ) = r , while P(ϕ) = p, P(ψ) = q,and r > p + q. Such a person would then sell bets:

−1(1ϕ − p) and − 1(1ψ − q)

while buying1(1ϕ∨ψ − r)

which results in a sure loss:

−r + p + q < 0 .



Suppose P : L′ → [0, 1] is defined on some finite sublanguage L′ ⊆ L.

Theorem (de Finetti 1974)The following are equivalent:

1. P does not conflict with the probability axioms (i.e., is consistent).

2. P is the restriction of a probability function on all of L.

3. No system of bets with odds as given by P leads to certain loss.

In fact, de Finetti proved something stronger, for general (potentiallyuncountable) probability spaces.

Many have argued that 3. is merely for dramatic effect, with consistencythe fundamental notion. (See especially papers by Colin Howson.)


Part III:Patterns of Plausible Reasoning


Patterns of Plausible Reasoning

Our theme is simply: ProbabilityTheory as Extended Logic. The“new” perception amounts to therecognition that the mathematicalrules of probability theory are notmerely rules for calculatingfrequencies of “random variables”;they are also the unique consistentrules for conducting inference (i.e.plausible reasoning) of any kind . . .

–E. T. Jaynes, 1995



Deductive inference

All wombats are wild animalsNo wild animals snore

No wombats snore

Jose is either in Frankfurt or ParisJose is not in ParisJose is in Frankfurt

...



Plausible Inference

Example (Jaynes 1995)Suppose some dark night a policeman walks down a street, apparentlydeserted; but suddenly he hears a burglar alarm, looks across the street,and sees a jewelry store with a broken window. Then a gentlemanwearing a mask comes crawling out through the broken window, carryinga bag which turns out to be full of expensive jewelry. The policemandoesn’t hesitate at all in deciding that this gentleman is dishonest.

ϕ being true makes ψ more plausible

ψ is true

ϕ becomes more plausible

(N.B. This is one of Polya’s (1954) five patterns of plausible inference.)



Jaynes’ three assumptions guiding a theory of probability:

1. Degrees of plausibility should be real numbers.

2. We must respect logic in requiring consistency of assessment.

3. The aim is to capture intuitive reasoning patterns about plausibility.

ϕ|ψ

How plausible is ϕ given that ψ is true?



ϕ ∧ ψ|χ

I In order for ϕ ∧ ψ to be true, ψ should be true. Thus ψ|χ shouldsomehow be involved.

I If ψ is true, we still need ϕ to be true as well. So ϕ|ψ ∧ χ shouldsomehow be involved.

I Thus, we conclude that there should be some monotonic functionF : R×R→ R, for which

ϕ ∧ ψ|χ = F((ψ|χ), (ϕ|ψ ∧ χ)

).

I Consistency requires that ϕ ∧ ψ|χ and ψ ∧ ϕ|χ coincide, and thus

F((ψ|χ), (ϕ|ψ ∧ χ)

)= F

((ϕ|χ), (ψ|ϕ ∧ χ)

).



As a matter of logic

�((ϕ ∧ ψ) ∧ χ

)↔(

ϕ ∧ (ψ ∧ χ))

.

Hence consistency requires that

(ϕ ∧ ψ) ∧ χ | δ = ϕ ∧ (ψ ∧ χ) | δ ,

i.e., that

F(F((χ|δ), (ψ|χ ∧ δ)

), (ϕ|ψ ∧ χ ∧ δ)

)= F

((χ|δ),F

((ψ|χ ∧ δ), (ϕ|ψ ∧ χ ∧ δ)

)).

Or more generally,

F(F (x , y), z

)= F

(x ,F (y , z)

).



Lemma (Cox 1946; Jaynes 1995)Under certain further assumptions about F—e.g., that it iscontinuous—there is a unique set of solutions to these requirements.Namely there must be a function w : R→ R such that

w(ϕ ∧ ψ|χ) = w(ψ|χ)w(ϕ|ψ ∧ χ) .



w(ϕ ∧ ψ|χ) = w(ψ|χ)w(ϕ|ψ ∧ χ) (1)

Evidently, if ϕ is certain, then

ϕ ∧ ψ|χ = ψ|χ and ϕ|ψ ∧ χ = ϕ|χ .

Hence, plugging into (1) we obtain w(ψ|χ) = w(ϕ|χ)w(ψ|χ), i.e.,

w(ϕ|χ) = 1 .

Likewise, if ϕ is impossible, then

ϕ ∧ ψ|χ = ϕ|χ and ϕ|ψ ∧ χ = ϕ|χ.

Hence, again appealing to (1) we obtain w(ϕ|ψ) = w(ϕ|ψ)w(ψ|χ), i.e.,

w(ϕ|χ) = 0 .

That is, in any case, we always have 0 ≤ w(ϕ|χ) ≤ 1.



Interim Summary

I Under basic assumptions, we have shown that there should alwaysbe weights w for plausibilities ϕ|ψ in between 0 and 1.

I These weights should obey the following equation:

w(ϕ ∧ ψ|χ) = w(ψ|χ)w(ϕ|ψ ∧ χ) .

I What about w(¬ϕ|ψ)?



Jaynes argues that there should be some S such that:

w(¬ϕ|ψ) = S(w(ϕ|ψ)

),

with S continuous and monotone decreasing.

Furthermore, by consistency we must have

w(ϕ ∧ ψ|χ) = w(ϕ|χ)S(w(¬ψ|ϕ ∧ χ)

).

Lemma (Cox 1946; Jaynes 1995)The only way of satisfying these requirements is if

w(¬ϕ|ψ) = 1− w(ϕ|ψ) .



Theorem (Cox 1946; Jaynes 1995)The requirements discussed above ensure there is P : L×L → [0, 1]:

I 0 ≤ P(ϕ|ψ) ≤ 1 ;

I P(¬ϕ|ψ) = 1− P(ϕ|ψ) ;

I P(ϕ ∧ ψ|χ) = P(ψ|χ)P(ϕ|ψ ∧ χ).

As Jaynes says, we can now use logic and the rules above to assignplausibilities/probabilities to any complex sentence on the basis of thoseassigned to simpler sentences.



Example (Sum Rule)

P(ϕ ∨ ψ|χ) = 1− P(¬ϕ ∧ ¬ψ|χ)= 1− P(¬ϕ|χ)P(¬ψ|¬ϕ ∧ χ)

= 1− P(¬ϕ|χ)[1− P(ψ|¬ϕ ∧ χ)]

= P(ϕ|χ) + P(¬ϕ ∧ ψ|χ)= P(ϕ|χ) + P(ψ|χ)P(¬ϕ|ψ ∧ χ)

= P(ϕ|χ) + P(ψ|χ)[1− P(ϕ|ψ ∧ χ)]

= P(ϕ|χ) + P(ψ|χ)− P(ψ|χ)P(ϕ|ψ ∧ χ)

= P(ϕ|χ) + P(ψ|χ)− P(ϕ ∧ ψ|χ) .

a



Example (Jewelry Thief)Returning to the other example, recall the pattern we wanted to capture:

ϕ being true makes ψ more plausible

ψ is true

ϕ becomes more plausible

This now comes out as a natural result of the calculus. For, suppose

P(ψ|ϕ ∧ χ) > P(ψ|χ) .

Then, because it is now derivable that

P(ϕ|ψ ∧ χ) = P(ϕ|χ)P(ψ|ϕ ∧ χ)

P(ψ|χ) ,

it follows thatP(ϕ|ψ ∧ χ) > P(ϕ|χ) .


Summary and Preview

I Many have claimed that probability should be seen as a part of(extended) logic. We have seen three sources of such claims:

• Logical inference preserves not just truth, but also probability.

• Probability, like logic, is about consistency.

• Probability is the theory of reasonable inference, going beyond logic.

I Next time we will look more closely at these ‘patterns of reasonableinference’, focusing on the relation between probabilistic notions andlogical systems concerned with nonmonotonicity.


Documents

Logic and Probability - Stanford Universityweb.stanford.edu/~icard/esslli2014/lecture1.pdf · 2014-08-15 · Some questions and points of contact: IIn what ways might probability