Uncertainty. Assumptions Inherent in Deductive Logic-based Systems All the assertions we wish to...

Preview:

Citation preview

Uncertainty

Assumptions Inherent in Deductive Logic-based Systems

• All the assertions we wish to make and use are universally true.

• Observations of the world (percepts) are complete and error-free.

• All conclusions consistent with our knowledge are equally viable.

• All the desirable inference rules are truth-preserving.

Completely Accurate Assertions

• Initial intuition: if an assertion is not completely accurate, replace it by several more specific assertions.

• Qualification Problem: would have to add too many preconditions (or might forget to add some).

• Example:

Complete and Error-Free Perception

• Errors are common: biggest problem in use of Pathfinder for diagnosis of lymph system disorder is human error in feature detection.

• Some tests are impossible, too costly, or dangerous. “We could determine if your hip pain is really due to a lower back problem if we cut these nerve connections.”

Consistent Conclusions are Equal

• A diagnosis of either early smallpox or cowpox is consistent with our knowledge and observations.

• But cowpox is more likely (e.g., if the sores are on your cow-milking hand).

Truth-Preserving Inference

• Even if our inference rules are truth-preserving, if there’s a slight probability of error in our assertions or observations, during chaining (e.g., resolution) these probabilities can compound quickly, and we are not estimating them.

Solution: Reason Explicitly About Probabilities

• Full joint distributions.

• Certainty factors attached to rules.

• Dempster-Shafer Theory.

• Qualitative probability and non-monotonic reasoning.

• Possibility theory (within fuzzy logic, which itself does not deal with probability).

• Bayesian Networks.

Start with the Terminology of Most Rule-based Systems

• Atomic proposition: assignment of a value to a variable.

• Domain of possible values: variables may be Boolean, Discrete (finite domain), or Continuous.

• Compound assertions can be built with standard logical connectives.

Rule-based Systems (continued)

• State of the world (model, interpretation): a complete setting of all the variables.

• States are mutually exclusive (at most one is actually the case) and collectively exhaustive (at least one must be the case).

• A proposition is equivalent to the set of all states in which it is true; standard compositional semantics of logic applies.

To Add Probability

• Replace variables with random variables.

• State of the world (setting of all the random variables) will be called an atomic event.

• Apply probabilities or degrees of belief to propositions: P(Weather=sunny) = 0.7.

• Alternatively, apply probabilities to atomic events – full joint distribution

Prior Probability

• The unconditional or prior probability of a proposition is the degree of belief accorded to it in the absence of any other information.

• Example: P(Cavity = true) or P(cavity)

• Example: P(Weather = sunny)

• Example: P(cavity (Weather = sunny))

Probability Distribution

• Allow us to talk about the probabilities of all the possible values of a random variable

• For discrete random variables: P(Weather=sunny)=0.7 P(Weather=rain)=0.2 P(Weather=cloudy)=0.08 P(Weather=snow)=0.02

• For the continuous case, a probability density function (p.d.f.) often can be used.

P Notation• P(X = x) denotes the probability that the

random variable X takes the value x;

• P(X) denotes a probability distribution over X.

For example:

P(Weather) = <0.7, 0.2, 0.08, 0.02>

Joint Probability Distribution• Sometimes we want to talk about the

probabilities of all combinations of the values of a set of random variables.

• P(Weather, Cavity) denotes the probabilities of all combinations of the values of the pair of random variables (4x2 table of probabilities)

Full Joint Probability Distribution

• The joint probability distribution that talks about the complete set of random variables used to describe the world is called the full joint probability distribution

• P(Cavity, Catch, Toothache, Weather) denotes the probabilities of all combinations of the values of the random variables (2x2x2x4) table of probabilities with 32 entries.

ExampleToothache Cavity Catch ProbabilityT T T 0.108T T F 0.012T F T 0.016T F F 0.064F T T 0.072F T F 0.008F F T 0.144F F F 0.576

Probability of a Proposition

• Recall that any proposition a is viewed as equivalent to the set of atomic events in which a holds call this set e(a);

• the probability of a proposition a is equal to the sum of the probabilities of atomic events in which a holds:

e(a)

i)P(e)(ei

aP

ExampleP(cavity) = 0.108 +0.012 + 0.072 + 0.008

= 0.2

P(cavity toothache) = 0.108 +0.012

P(cavity v toothache) = 0.108 +0.012 + 0.072 + 0.008 + 0.016 = 0.28

The Axioms of Probability

• For any proposition a,

• P(true) = 1 and P(false) = 0

• The probability of a disjunction is given by

0 P ( ) 1a

P ( ) = P ( ) + P ( ) - P ( )a b a b a b

Conditional (Posterior) Probability

• P(a|b): the probability of a given that all we know is b.

• P(cavity|toothache) = 0.8: if a patient has a toothache, and no other information is available, the probability that the patient has a cavity is 0.8.

• To be precise:P(a|b) = P(a b) / P(b)

Example

P ( | ) = P ( )

P ( )

= = 0 .6

cavity too thachecavity too thache

too thache

0 1 0 8 0 0 1 2

0 1 0 8 0 0 1 2 0 0 1 6 0 0 6 4

. .

. . . .

Related Example

P ( | ) = P ( )

P ( )

= = 0 .4

cavity too thachecavity too thache

too thache

0 0 1 6 0 0 6 4

0 1 0 8 0 0 1 2 0 0 1 6 0 0 6 4

. .

. . . .

Normalization

• In the two preceding examples the denominator (P(toothache)) was the same, and we looked at all possible values for the variable Cavity given toothache.

• The denominator can be viewed as a normalization constant .

• We don’t have to compute the denominator -- just normalize 0.12 and 0.08 to sum to 1.

Product Rule

• From conditional probabilities we obtain the product rule:

P ( ) = P ( | ) P ( )

P ( ) = P ( | ) P ( )

a b a b b

a b b a a

Bayes’ RuleR eca ll p ro d u c t ru le :

P ( ) = P ( ) P ( )

P ( ) = P ( ) P ( )

E q u a tin g r ig h t - h an d sid es an d d iv id in g b y P ( ):

P ( ) = P ( ) P ( )

P ( )

F o r m u lti - v a lu ed v ariab les an d :

) = ( ) ( )

( )

a b a b b

a b b a a

a

b aa b b

a

X Y

Y XX Y Y

X

|

|

||

||

PP P

P(

Bayes’ Rule with Background Evidence

O ften w e ' ll w an t to u se B ay es ' R u le

co n d itio n a lized o n so m e b ack g ro u n d

ev id en ce :

( , ) = ( , ) ( | )

( | )

e

P eP e P e

P eY |X

X |Y Y

X

Example of Bayes’ Rule

• P(stiff neck|meningitis) = 0.5

• P(meningitis) = 1/50,000

• P(stiff neck) = 1/20

• Then P(meningitis|stiff neck) =

P ( | ) P (

P ( ) =

=

stiff neck m en ing itis m en ing itis

stiff neck

)

( . )( / , )

/.

0 5 1 5 0 0 0 0

1 2 00 0 0 0 2

Normalization with Bayes’ Rule

• P(stiff neck|meningitis) and P(meningitis) are relatively easy to estimate from medical records.

• Prior probability of stiff neck is harder to estimate accurately.

• Bayes’ rule with normalization:

P P P( ) = ( ) ( )Y |X X |Y Y

Normalization with Bayes’ Rule (continued)

M igh t

stiff neck m en ing itis m en ing itis

stiff neck m en ing itis m en ing itis

stiff neck

b e easie r to co m p u te

P ( | ) P ( ) an d

P ( | ) P ( )

th an to co m p u te

P ( ) .

Why Use Bayes’ Rule

• Causal knowledge such as P(stiff neck|meningitis) often is more reliable than diagnostic knowledge such as P(meningitis|stiff neck).

• Bayes’ Rule lets us use causal knowledge to make diagnostic inferences (derive diagnostic knowledge).

Boldface Notation

• Sometimes we want to write an equation that holds for a vector (ordered set) of random variables. We will denote such a set by boldface font. So Y denotes a random variable, but Y denotes a set of random variables.

• Y = y denotes a setting for Y, but Y=y denotes a setting for all variables in Y.

Marginalization & Conditioning

• Marginalization (summing out): for any sets of variables Y and Z:

• Conditioning(variant of marginalization):

P P(Y ) = (Y , z ) z Z

P P P(Y ) = (Y | z ) (z ) z Z

O ften w an t to d o th is fo r ( ) in stead o f (Y ).

R eca ll ( ) = ( )

( )

P P

PP

P

Y |X

Y |XX Y

X

General Inference Procedure

• Let X be a random variable about which we want to know its probabilities, given some evidence (values e for a set E of other variables). Let the remaining (unobserved) variables be Y. The query is P(X|e), and it can be answered by

P e P e P e yy

( | ) = (X , ) = (X , , )

X