42
Artificial Intelligence Uncertainty & probability Chapter 13, AIMA

Artificial Intelligence Uncertainty & probability Chapter 13, AIMA

  • View
    266

  • Download
    5

Embed Size (px)

Citation preview

Page 1: Artificial Intelligence Uncertainty & probability Chapter 13, AIMA

Artificial Intelligence

Uncertainty & probabilityChapter 13, AIMA

Page 2: Artificial Intelligence Uncertainty & probability Chapter 13, AIMA

”When an agent knows enough facts aboutits environment, the logical approach enablesit to derive plans that are guaranteed to work.”

”Unfortunately, agents almost never have accessto the whole truth about the environment.”

From AIMA, chapter 13

Page 3: Artificial Intelligence Uncertainty & probability Chapter 13, AIMA

Slide from S. Russell

Airport example

Page 4: Artificial Intelligence Uncertainty & probability Chapter 13, AIMA

Why FOL fails?

Laziness: We can’t be bothered to list all possible requirements.

Ignorance: We have no complete (nor tested) theoretical model for the domain.

Practical reasons: Even if we knew everything and could be persuaded to list everything, the rules would be totally impractical to use (we can’t possibly test everything).

Page 5: Artificial Intelligence Uncertainty & probability Chapter 13, AIMA

Instead, use decision theory

• Assign probability to a proposition based on the percepts (information).

• Proposition is either true or false. Probability means assigning a belief on how much we believe in it being either true or false.

• Evidence = information that the agent receives. Probabilities can (will) change when more evidence is acquired.

• Prior/unconditional probability ⇔ no evidence at all.

• Posterior/conditional probability ⇔ after evidence is obtained.

• No plan may be guaranteed to achieve the goal.

• To make choices, the agent must have preferences between the different possible outcomes of various plans.

• Utility represents the value of the outcomes, and utility theory is used to reason with the resulting preferences.

• An agent is rational if and only if it chooses the action that yields the highest expected utility, averaged over all possible outcomes.

Decision theory = probability theory + utility theory

Inspired by Sun-Hwa Hahn

Probability Utility

Page 6: Artificial Intelligence Uncertainty & probability Chapter 13, AIMA

Basic probability notation

X : Random variable

• Boolean: P(X=true), P(X=false)• Discrete: P(X=a), P(X=b), ...• Continuous: P(x)dx

Page 7: Artificial Intelligence Uncertainty & probability Chapter 13, AIMA

Boolean variableX ∈ {True, False} 1)()( falseXPtrueXP

Examples:P(Cavity)

P(W31)

P(Survive)P(CatchFlight)P(Cancer)P(LectureToday)...

Page 8: Artificial Intelligence Uncertainty & probability Chapter 13, AIMA

Discrete variableX ∈ {a1,a2,a3,a4,...}

1)( i

iaXP

Examples:P(PlayingCard)P(Weather)P(Color)P(Age)P(LectureQual)...

Page 9: Artificial Intelligence Uncertainty & probability Chapter 13, AIMA

Continuous variable XProbability density P(x)

1)( dxxP

a

a

dxxPaXaP )()(

Examples:P(Temperature)P(WindSpeed)P(OxygenLevel)P(LengthLecture)...

Page 10: Artificial Intelligence Uncertainty & probability Chapter 13, AIMA

Propositions

Elementary: Cavity = True, W31 = True, Cancer = False, Card = A♠, Card = 4♥, Weather = Sunny, Age = 40, (20C < Temperature < 21C), (2 hrs < LengthLecture < 3 hrs)

Complex: (Elementary + connective)¬Cancer ∧ Cavity, Card1 = A♠ ∧ Card2 = A♣, Sunny ∧ (30C < Temperature) ∧ ¬Beer

Every proposition is either true or false. We apply a degree of belief in whether it is true or not.

Extension of propositional calculus...

Page 11: Artificial Intelligence Uncertainty & probability Chapter 13, AIMA

Atomic event

A complete specification of the state of the world.

• Mutually exclusive• Exhaustive

World

{Cavity, Toothache}

cavity ∧ toothache cavity ∧ ¬toothache ¬cavity ∧ toothache ¬cavity ∧ ¬toothache

Page 12: Artificial Intelligence Uncertainty & probability Chapter 13, AIMA

Prior, posterior, joint probabilities

• Prior probability P(X=a) = P(a). Our belief in X=a being true before information is collected.

• Posterior probability P(X=a|Y=b) = P(a|b).Our belief that X=a is true when we know that Y=b is true (and this is all we know).

• Joint probability P(X=a, Y=b) = P(a,b) = P(a ∧ b).Our belief that (X=a ∧ Y=b) is true.P(a ∧ b) = P(a,b) = P(a|b)P(b)

Image borrowed from Lazaric & Sceffer

Note boldface

Page 13: Artificial Intelligence Uncertainty & probability Chapter 13, AIMA

Conditional probability(Venn diagram)

P(A) P(B)

P(A,B) = P(A ∧ B)

P(A|B) = P(A,B)/P(B)

Page 14: Artificial Intelligence Uncertainty & probability Chapter 13, AIMA

Conditional probability examples

1. You draw two cards randomly from a deck of 52 playing cards. What is the conditional probability that the second card is an ace if the first card is an ace?

2. In a course I’m giving with oral examination the examination statistics over the period 2002-2005 have been: 23 have passed the oral examination in their first attempt, 25 have passed it their second attempt, 7 have passed it in their third attempt, and 8 have not passed it at all (after having failed the oral exam at least three times). What is the conditional probability (risk) for failing the course if the student fails the first two attempts at the oral exam?

3. (2005) 84% of the Swedish households have computers at home, 81% of the Swedish households have both a computer and internet. What is the probability that a Swedish household has internet if we know that it has a computer?

Work these out...

Page 15: Artificial Intelligence Uncertainty & probability Chapter 13, AIMA

Conditional probability examples

1. You draw two cards randomly from a deck of 52 playing cards. What is the conditional probability that the second card is an ace if the first card is an ace?

2. In a course I’m giving with oral examination the examination statistics over the period 2002-2005 have been: 23 have passed the oral examination in their first attempt, 25 have passed it their second attempt, 7 have passed it in their third attempt, and 8 have not passed it at all (after having failed the oral exam at least three times). What is the conditional probability (risk) for failing the course if the student fails the first two attempts at the oral exam?

3. (2005) 84% of the Swedish households have computers at home, 81% of the Swedish households have both a computer and internet. What is the probability that a Swedish household has internet if we know that it has a computer?

51

3)ace1|ace2( P What’s the probability that both are aces?

Page 16: Artificial Intelligence Uncertainty & probability Chapter 13, AIMA

Conditional probability examples

1. You draw two cards randomly from a deck of 52 playing cards. What is the conditional probability that the second card is an ace if the first card is an ace?

2. In a course I’m giving with oral examination the examination statistics over the period 2002-2005 have been: 23 have passed the oral examination in their first attempt, 25 have passed it their second attempt, 7 have passed it in their third attempt, and 8 have not passed it at all (after having failed the oral exam at least three times). What is the conditional probability (risk) for failing the course if the student fails the first two attempts at the oral exam?

3. (2005) 84% of the Swedish households have computers at home, 81% of the Swedish households have both a computer and internet. What is the probability that a Swedish household has internet if we know that it has a computer?

51

3)ace1|ace2( P %5.0

52

4

51

3)ace1()ace1|ace2()ace1,ace2( PPP

Page 17: Artificial Intelligence Uncertainty & probability Chapter 13, AIMA

Conditional probability examples

1. You draw two cards randomly from a deck of 52 playing cards. What is the conditional probability that the second card is an ace if the first card is an ace?

2. In a course I’m giving with oral examination the examination statistics over the period 2002-2005 have been: 23 have passed the oral examination in their first attempt, 25 have passed it their second attempt, 7 have passed it in their third attempt, and 8 have not passed it at all (after having failed the oral exam at least three times). What is the conditional probability (risk) for failing the course if the student fails the first two attempts at the oral exam?

3. (2005) 84% of the Swedish households have computers at home, 81% of the Swedish households have both a computer and internet. What is the probability that a Swedish household has internet if we know that it has a computer?

51

3)ace1|ace2( P

%5363/15

63/8

)2 & OE1 Fail(

)2 & OE1 Fail,course Fail()2 & 1 OE Fail|course Fail(

P

PP

%5.052

4

51

3)ace1()ace1|ace2()ace1,ace2( PPP

25236315

87252363

Page 18: Artificial Intelligence Uncertainty & probability Chapter 13, AIMA

Conditional probability examples

1. You draw two cards randomly from a deck of 52 playing cards. What is the conditional probability that the second card is an ace if the first card is an ace?

2. In a course I’m giving with oral examination the examination statistics over the period 2002-2005 have been: 23 have passed the oral examination in their first attempt, 25 have passed it their second attempt, 7 have passed it in their third attempt, and 8 have not passed it at all (after having failed the oral exam at least three times). What is the conditional probability (risk) for failing the course if the student fails the first two attempts at the oral exam?

3. (2005) 84% of the Swedish households have computers at home, 81% of the Swedish households have both a computer and internet. What is the probability that a Swedish household has internet if we know that it has a computer?

51

3)ace1|ace2( P

%5363/15

63/8

)2 & OE1 Fail(

)2 & OE1 Fail,course Fail()2 & 1 OE Fail|course Fail(

P

PP

%9684.0

81.0

)Computer(

)Computer,Internet()Computer|Internet(

P

PP

%5.052

4

51

3)ace1()ace1|ace2()ace1,ace2( PPP

What’s the prior probability for not passing the course? =13%

Page 19: Artificial Intelligence Uncertainty & probability Chapter 13, AIMA

Inference

• Inference means computing

P(State of the world | Observed evidence)

P(Y | e)

Page 20: Artificial Intelligence Uncertainty & probability Chapter 13, AIMA

Inference w/ full joint distribution

• The full joint probability distribution is the probability distribution for all random variables used to describe the world.

toothache ¬toothache

catch ¬catch catch ¬catch

cavity 0.108 0.012 0.072 0.008

¬cavity 0.016 0.064 0.144 0.576

Dentist example {Toothache, Cavity, Catch}

Page 21: Artificial Intelligence Uncertainty & probability Chapter 13, AIMA

Inference w/ full joint distribution

• The full joint probability distribution is the probability distribution for all random variables used to describe the world.

toothache ¬toothache

catch ¬catch catch ¬catch

cavity 0.108 0.012 0.072 0.008 0.200

¬cavity 0.016 0.064 0.144 0.576

Dentist example {Toothache, Cavity, Catch}

P(cavity) = 0.2

Marginal probability(marginalization)

Page 22: Artificial Intelligence Uncertainty & probability Chapter 13, AIMA

Inference w/ full joint distribution

• The full joint probability distribution is the probability distribution for all random variables used to describe the world.

toothache ¬toothache

catch ¬catch catch ¬catch

cavity 0.108 0.012 0.072 0.008

¬cavity 0.016 0.064 0.144 0.576

0.200

Dentist example {Toothache, Cavity, Catch}

P(cavity) = 0.2

Marginal probability(marginalization)

P(toothache) = 0.2

Page 23: Artificial Intelligence Uncertainty & probability Chapter 13, AIMA

Inference w/ full joint distribution

• The full joint probability distribution is the probability distribution for all random variables used to describe the world.

toothache ¬toothache

catch ¬catch catch ¬catch

cavity 0.108 0.012 0.072 0.008

¬cavity 0.016 0.064 0.144 0.576

Dentist example {Toothache, Cavity, Catch}

z

z

zPzYPYP

zYPYP

)()|()(

),()(

Marginal probability(marginalization)

Page 24: Artificial Intelligence Uncertainty & probability Chapter 13, AIMA

toothache ¬toothache

catch ¬catch catch ¬catch

cavity 0.108 0.012 0.072 0.008

¬cavity 0.016 0.064 0.144 0.576

Dentist example {Toothache, Cavity, Catch}

1.08.0

008.0

)(

)()|(

4.02.0

08.0

)(

)()|(

6.02.0

12.0

)(

)()|(

toothacheP

toothachecavityPtoothachecavityP

toothacheP

toothachecavityPtoothachecavityP

toothacheP

toothachecavityPtoothachecavityP

Page 25: Artificial Intelligence Uncertainty & probability Chapter 13, AIMA

The general inference procedure

Inference w/ full joint distribution

y

yePePeP ),,(),()|( YYY

where is a normalization constant. This can always be computed if the full joint probability distribution is available.

),|(),|()|( catchtoothacheCavitycatchtoothacheCavitytoothacheCavity PPP

Cumbersome...erhm...totally impractical impossible, O(2n) for process

Page 26: Artificial Intelligence Uncertainty & probability Chapter 13, AIMA

Independence

Independence between variables can dramatically reduce the amount of computation.

)()|(

)()|(

)()(),(

YXY

XYX

YXYX

PP

PP

PPP

We don’t need to mix independent variables in our computations

Page 27: Artificial Intelligence Uncertainty & probability Chapter 13, AIMA

Independence for dentist example

Image borrowed from Lazaric & Sceffer

Oral health is independent of weather

Page 28: Artificial Intelligence Uncertainty & probability Chapter 13, AIMA

Bayes’ theorem

)()|()()|(),( APABPBPBAPBAP

)(

)()|()|(

BP

APABPBAP

Page 29: Artificial Intelligence Uncertainty & probability Chapter 13, AIMA

Bayes theorem example

Joe is a randomly chosen member of a large population in which 3% are heroin users. Joe tests positive for heroin in a drug test that correctly identifies users 95% of the time and correctly identifies nonusers 90% of the time.

Is Joe a heroin addict?

Example from http://plato.stanford.edu/entries/bayes-theorem/supplement.html

Page 30: Artificial Intelligence Uncertainty & probability Chapter 13, AIMA

Bayes theorem example

Joe is a randomly chosen member of a large population in which 3% are heroin users. Joe tests positive for heroin in a drug test that correctly identifies users 95% of the time and correctly identifies nonusers 90% of the time.

Is Joe a heroin addict?

Example from http://plato.stanford.edu/entries/bayes-theorem/supplement.html

%23227.0)|(

1255.0)()|()()|()(

90.01%10)|( ,95.0%95)|(

97.0)(1)( ,03.0%3)(

)(

)()|()|(

posHP

HPHposPHPHposPposP

HposPHposP

HPHPHP

posP

HPHposPposHP

Page 31: Artificial Intelligence Uncertainty & probability Chapter 13, AIMA

Bayes theorem: The Monty Hall Game show

See http://www.io.com/~kmellis/monty.html

© Let’s make a deal (A Joint Venture)

In a TV Game show, a contestant selects one of three doors; behind one of the doors there is a prize, and behind the other two there are no prizes.

After the contestant select a door, the game-show host opens one of the remaining doors, and reveals that there is no prize behind it. The host then asks the contestant whether he/she wants to SWITCH to the other unopened door, or STICK to the original choice.

What should the contestant do?

Page 32: Artificial Intelligence Uncertainty & probability Chapter 13, AIMA

The Monty Hall Game Showii door opensHost open },3,2,1{door behind prize

© Let’s make a deal (A Joint Venture)

Page 33: Artificial Intelligence Uncertainty & probability Chapter 13, AIMA

The Monty Hall Game Show

1)3|open( ,0)2|open( ,2/1)1|open(

2/1)()|open()open(

3/2)open(

)3()3|open()open|3(

3/1)open(

)1()1|open()open|1(

open 2door opensHost

1door selects Contestant

door opensHost open },3,2,1{door behind prize

222

3

122

2

22

2

22

2

PPP

iPiPP

P

PPP

P

PPP

i

i

i

© Let’s make a deal (A Joint Venture)

Page 34: Artificial Intelligence Uncertainty & probability Chapter 13, AIMA

The Monty Hall Game Show

1)3|open( ,0)2|open( ,2/1)1|open(

2/1)()|open()open(

3/2)open(

)3()3|open()open|3(

3/1)open(

)1()1|open()open|1(

open 2door opensHost

1door selects Contestant

door opensHost open },3,2,1{door behind prize

222

3

122

2

22

2

22

2

PPP

iPiPP

P

PPP

P

PPP

i

i

i

© Let’s make a deal (A Joint Venture)

Page 35: Artificial Intelligence Uncertainty & probability Chapter 13, AIMA

Bayes theorem: The Monty Hall Game show

In a TV Game show, a contestant selects one of three doors; behind one of the doors there is a prize, and behind the other two there are no prizes.

After the contestant select a door, the game-show host opens one of the remaining doors, and reveals that there is no prize behind it. The host then asks the contestant whether he/she wants to SWITCH to the other unopened door, or STICK to the original choice.

What should the contestant do?

The host is actually asking the contestant whether he/she wants to SWITCH the choice to both other doors, or STICK to the original choice. Phrased this way, it is obvious what the optimal thing to do is.

Page 36: Artificial Intelligence Uncertainty & probability Chapter 13, AIMA

Conditional independence

We say that X and Y are conditionally independent if

)|()|()|,( ZYZXZYX PPP

Z

X Y

Page 37: Artificial Intelligence Uncertainty & probability Chapter 13, AIMA

Naive Bayes: Combining evidence

Assume full conditional independence and express the full joint probability distribution as:

)()|(

)()|()|(

)()|,,,(

),,,,(

1

1

21

21

CauseCauseEffect

causeCauseEffectCauseEffect

CauseCauseEffectEffectEffect

CauseEffectEffectEffect

n

ii

n

n

n

PP

PPP

PP

P

Page 38: Artificial Intelligence Uncertainty & probability Chapter 13, AIMA

Naive Bayes: Dentist example

toothache ¬toothache

catch ¬catch catch ¬catch

cavity 0.108 0.012 0.072 0.008

¬cavity 0.016 0.064 0.144 0.576

108.0),,( : valueTrue

108.02.02.0

)072.0108.0(

2.0

)012.0108.0(

),,(

)()|()|(

)()|,(

),,(

cavitycatchtoothache

cavitycatchtoothache

CavityCavityCatchCavityToothache

CavityCavityCatchToothache

CavityCatchToothache

P

P

PPP

PP

P

Page 39: Artificial Intelligence Uncertainty & probability Chapter 13, AIMA

Naive Bayes: Dentist example

toothache ¬toothache

catch ¬catch catch ¬catch

cavity 0.108 0.012 0.072 0.008

¬cavity 0.016 0.064 0.144 0.576

Full table has 23-1=7 independent numbers [O(2n)]

catch ¬catch

cavity 0.180 0.020

¬cavity 0.160 0.640

toothache ¬toothache

cavity 0.120 0.080

¬cavity 0.080 0.720

cavity 0.200

¬cavity 0.800

2 independent numbers

2 independent numbers

1 independent number

Page 40: Artificial Intelligence Uncertainty & probability Chapter 13, AIMA

Naive Bayes application: Learning to classify text

• Use a dictionary with words (not too frequent and not to infrequent), e.g. w1 = airplane, w2 = algorithm, ...

• Estimate conditional probabilities P(wi | interesting) and P(wi | uninteresting)

• Compute P(text | interesting) and P(text | uninteresting) using Naive Bayes (and assuming that word position in text is unimportant)

i

iwPP )ginterestin|()ginterestin|text(

Where wi are the words occuring in the text.

Page 41: Artificial Intelligence Uncertainty & probability Chapter 13, AIMA

Naive Bayes application: Learning to classify text

• Then compute the probability that the text is interesting (or uninteresting) using Bayes’ theorem

)text(

)ginterestin|text()ginterestin()text|ginterestin(

P

PPP

P(text) is just a normalization factor; it is notnecessary to compute it since we are only interested in if P(interesting | text) > P(uninteresting | text)

Page 42: Artificial Intelligence Uncertainty & probability Chapter 13, AIMA

• Uncertainty arises because of laziness and ignorance: This is unavoidable in the real world.

• Probability expresses the agent’s belief in different outcomes.

• Probability helps the agent to act in a rational manner. Rational = maximize the own expected utility.

• In order to get efficient algorithms to deal with probability we need conditional independences among variables.

Conclusions