7
433 About Assessing and Evaluating Uncertain Inferences Within the Theory of Evidence Thomas KAMPKE Forschungsinstitut fib, anwendungsorientierte Wissensverarbei. tung (FA W) an der Universitiit Ulm, 7900 Uim, FRG Dealing with uncertainty of facts and rules in an inference system will be discussed. The assessment and evaluation of uncertainties will be done within Dempster's and Shafer's theory of evidence. The relation between this theory and classical probability theory will be stressed. Keywords: Associated Random Variables, Belief Functions, Evidence, Inference Systems, Production Rules. Thomas Kiimpkeis on the staff of the recently founded Forschungsinstitut far anwendungsorientierte Wissens- verarbeitung (FAW) at the University of Ulm, W. Germany. He held posi- tions at the Technical University of Aachen and at the University of Pas- san, both W. Germany, and had been visiting the University of California, Berkeley. His research interests in- clude scheduling, optimization, sto- chastic modeding and decision analy- sis. He is currently involved in the development of an environmental information system. His papers appeared in Journal of Applied Probability, Operations Research, Advances in Applied Probability, ~nnals of Operations Research, and Communications in Statistics. North-Holland Decision Support Systems 4 (1988) 433-439 1. Introduction The ability to handle uncertainty is a desireable feature of software systems in many areas includ- ing operations research. The variety of applica- tions is too large to even think of a unified ap- proach to this capabihty. But when e.g. restricting to (rule based) expert systems, some general prop- erties of assessing uncertainty can be provided. A rule and the structure imposed on it to facihtate this assessment should be considered as a unity rather than as different items. As a consequence, a logically equivalent formulation of only a few rules holding with certainty may be something completely different in an uncertain environment. Examples are given below. However, the assess- ment should be done in a way agreeable to the case of certainty, also from an intuitive point of view. Furthermore we will require that numerical values assessed for uncertainties do interrelate in an intuitive way. Probability theory will play a major role in both the assessment and the evalua- tion of systems of inferences. Independence of uncertain facts and rules can be formulated ana- loge to independence of random variables and assuming independence where it is absent will be shown to result in a systematic bias. Monte Carlo simulation, esp. antithetic simulation, will be dem- onstrated to be one method of evaluating systems of uncertain inferences. The theory of evidence has been developed by Dempster and later on by Shafer to model uncer- tain knowledge, called belief or evidence, about facts under consideration. These facts may e.g. be symptoms or a diagnosis in medical context. The basic idea behind the theory of evidence is a separation of disbelief in a fact from ignorance about it. This is motivated by the every day expe- rience that evidence not supporting a fact does not neces~=rily support its opposite; the evidence at hand may be too weak to either decide in favour or against some fact. A simple example from [9]: suppose a vase has been excavated and an archaeologist has to decide whether it is ancient or 0167-9236/88/$3.50 © 1988, Elsevier Science Publishers B.V. (North-Holland)

About assessing and evaluating uncertain inferences within the theory of evidence

Embed Size (px)

Citation preview

Page 1: About assessing and evaluating uncertain inferences within the theory of evidence

433

About Assessing and Evaluating Uncertain Inferences Within the Theory of Evidence

Thomas KAMPKE Forschungsinstitut fib, anwendungsorientierte Wissensverarbei. tung (FA W) an der Universitiit Ulm, 7900 Uim, FRG

Dealing with uncertainty of facts and rules in an inference system will be discussed. The assessment and evaluation of uncertainties will be done within Dempster's and Shafer's theory of evidence. The relation between this theory and classical probability theory will be stressed.

Keywords: Associated Random Variables, Belief Functions, Evidence, Inference Systems, Production Rules.

Thomas Kiimpke is on the staff of the recently founded Forschungsinstitut far anwendungsorientierte Wissens- verarbeitung (FAW) at the University of Ulm, W. Germany. He held posi- tions at the Technical University of Aachen and at the University of Pas- san, both W. Germany, and had been visiting the University of California, Berkeley. His research interests in- clude scheduling, optimization, sto- chastic modeding and decision analy- sis. He is currently involved in the

development of an environmental information system. His papers appeared in Journal of Applied Probability, Operations Research, Advances in Applied Probability, ~nnals of Operations Research, and Communications in Statistics.

North-Holland Decision Support Systems 4 (1988) 433-439

1. Introduction

The ability to handle uncertainty is a desireable feature of software systems in many areas includ- ing operations research. The variety of applica- tions is too large to even think of a unified ap- proach to this capabihty. But when e.g. restricting to (rule based) expert systems, some general prop- erties of assessing uncertainty can be provided. A rule and the structure imposed on it to facihtate this assessment should be considered as a unity rather than as different items. As a consequence, a logically equivalent formulation of only a few rules holding with certainty may be something completely different in an uncertain environment. Examples are given below. However, the assess- ment should be done in a way agreeable to the case of certainty, also from an intuitive point of view. Furthermore we will require that numerical values assessed for uncertainties do interrelate in an intuitive way. Probability theory will play a major role in both the assessment and the evalua- tion of systems of inferences. Independence of uncertain facts and rules can be formulated ana- loge to independence of random variables and assuming independence where it is absent will be shown to result in a systematic bias. Monte Carlo simulation, esp. antithetic simulation, will be dem- onstrated to be one method of evaluating systems of uncertain inferences.

The theory of evidence has been developed by Dempster and later on by Shafer to model uncer- tain knowledge, called belief or evidence, about facts under consideration. These facts may e.g. be symptoms or a diagnosis in medical context. The basic idea behind the theory of evidence is a separation of disbelief in a fact from ignorance about it. This is motivated by the every day expe- rience that evidence not supporting a fact does not neces~=rily support its opposite; the evidence at hand may be too weak to either decide in favour or against some fact. A simple example from [9]: suppose a vase has been excavated and an archaeologist has to decide whether it is ancient or

0167-9236/88/$3.50 © 1988, Elsevier Science Publishers B.V. (North-Holland)

Page 2: About assessing and evaluating uncertain inferences within the theory of evidence

434 T. Kiimpke / About Assessing and Evaluating Uncertain Inferences

a modem fake. In the begin~±ng he might assign the value .1 - on the scale [0,1] - to either possibilities, because he finds some weak evidence for both of them. The value .8 will be assigned to a third alternative "not made up his mind". Later evidence, such as a test of age, may be found to review the original evidence. (Of course, the dis- covery of the label "made in . . . " will drastically clear the matter.) Updating evidence is a central issue of the theory of evidence. For a thorough treatise see [9].

More recently the theory of evidence has been carried over to model not only facts, but also rules, see e.g. [1], [3], [6], [7] and [10]. The rules are like production rules ( if . . . then . . . ) from expert systems and they arc thought of as uncertain as well. Up to date there seems to be no standard way to formalize uncertainty of inferences. Some thoughts about this will be presented in the sequel, beginning with notions from the Dempster-Shafer theory.

For a finite, non-empty set O, called frame of discernment, a basic probability assignment is a function m: P(O) ~ [0,1] with re(O) - 0 and Y'-a~,,,(s)m(A)- 1, P(O) denotes the power set of @. The function m may be considered as probabil- ity density prob of a set-valued random variable X, where prob(XffiA)ffim(A), A~P(O) . A frame of discernment can be seen as a set of possibilities, where exactly one of them is true, but it is unknown which one, in general. The value of m for a set A may be interpreted as the belief that the true possibility is in A, but not in any subset of A. The belief that the true possibility is in A or any subset of A is given by the belief function Bel: P(O) ~ [0,1] with BeI(A) := ~Bc_Am(B).

For a given belief function Bel the correspond- ing probability assignment m is given by

m( )ffi E BGA

Thus, stating belief functions is equivalent to stat- ing probability assignments and a probability as- signment m will often be explicitely stated only for those sets A with re(A)> 0, called focal sets. Note that for any belief function Be l (A)+ Bel(A c) < 1; the inequality may be strict as in the example above with A -- {vase ancient} and A c = {vase not ancient}.

For two belief functions with probability as- signments m~ and m2 over the same frame of

discernment O the orthogonal sum • is defined, see [9], as

ml • m2(C) := Y" ml(A)m2(B ) A, B with A n B = C

/ Y'. ml(A)m2(B ). A , B with A ~ B ~ O

is derived from m 1 and m2 with corresponding random variables being stochastically indepen- dent, see [10, p. 45]. m 1 ~ m2 = m2 • ml and ~ is associative if applied to more than 2 operands.

facilitates updating belief by combining belief from "various sources".

In the following, frames of discernment will describe the validity of one or several proposi- tions. The simplest case will be @= { A,--,A}, where A stands for: proposition A holds, and --,A stands for: proposition A does not hold. To sim- plify the notation, we deliberately use the same symbol for a proposition itself and for the pro- position being true. The context will always make clear the difference. The joint validity of several propositions will be desribed within the Cartesian product of the underlaying frames of discernment.

To view propositions within a narrowing or widening frame of discernment, we will make use of the extension and projection - sometimes called restriction as in [6] or marginal as in [3] - of a belief function: comp. [9]. Let m be the probabil- ity assignment of a belief function over a frame 01 and e 2 be another frame. The extension of m onto ~1 × @2 is given by

(ext o2 m) (A, 0 2 ) ' = m(A) ,

A ~ P(O1). For a belief function Bel on 01 x 0 2 the projection onto ~1 is defined by

(prslBel)(A) "= Bel( A, 02 ) ,

A ~ P(O1). For properly chosen factors O1 and O2 the projection of the extension of a belief function is the original belief function, while the extension of the projection is generally not; comp. [31.

Modelling Rules

Suppose two propositions A and B and the rule R: A --* B (if A then B) are given. We further

Page 3: About assessing and evaluating uncertain inferences within the theory of evidence

T. Kiimpke / About Assessing and Evaluating Uncertain Inferences 435

suppose a belief function Bel A for A to be given by

m A ( A ) f P l ,

mA(--,A) =P2,

mA(OA) -P3,

with p l + P2 -I-P3 = 1, p~ > 0, and OA = { A , -~ A }. For the rule, a belief function Bela _. s on OA x OB will be given, the elements of OA x OB being denoted by e.g. (A, ~ B) and a subset of OA x OB such as {(A, B), (~ A, B), (~ A, --1B)} will be written in disjunctive form: (A, B ) v (~A, B ) v (~A, --,B).

mA_~s((A, B) V (-~A, B) V (-~A,-~B))=q,,

mA_.,s(OA x OB)=q2 ,

with q~ + q2 --- 1 and q~ > 0. Applying rule R to proposition A, which is

equivalent to the modus ponens for facts and rules with certainty, will be facilitated by forming BelA (9 BelA_. B, precisely (ext0sBelA) (9 Bel A _. s (obvi- ous extensions and projections will be omitted):

mA (gmA-.B((A, B) )=Plq l ,

((A, s ) v s ) v

( (A, OS) )

( OA x OB)) = P3q2.

Thus, Bel s '= = pros((extoaBel A) (9 Bela_, s) is given by

m s ( B ) =Plql,

= o ,

( o B ) = 1 - p q,.

We observe that prsa((extosBelA)(9 Bela_, a) = Bel A, i.e. applying the rule A ~ B to A does not effect the initial belief in A. Furthermore, the probability assignment of pres(Bela__, a) con- centrates all mass in OB; m(OB)= 1. Hence, from the formulation of the rule, there is complete ignorance about B. Thus, the rule itself (!) neither implies any belief in the consequent B nor does it bias the belief in the antecedent A.

Note furthermore, that the application of a rule to a fact is given by (9. We assume that facts are independent from rules in the sense described above.

For implied evidence the following monotonic- ity property holds.

Lemma 1. Given an arbitrary belief function Bel~ for Bo I f ~,i,ere is independent evidence for B, de- noted BelB, implied by some fact A and the rule A --+ B, then (Bel I (9 Bels)(B) > BelI(B ) and (Bell (9 BeiB)(~B) < Beli(~B ).

By the belief function BelB there is no disbelief in B, BeIB(~B)= 0. This gave rise to different ways to model rules. Lee and Shin [7] assess the belief BelLs in a rule by

mLs((A, B) V (~A, B) v (-~A, -~B)) = ql,

((A, ---,B)) =q2,

( eA x OB) = q3,

ql +q2 + q 3 = l , qi>/0-

Thus, P roA( (ex tosBe la ) (9 BelLs)(A ) #: BelA(A), for q2>0, the belief in A "before" applying the rule differs from that "after". This seems strange, the rule should not bias the initial belief. Moreover, (proBBelLs)(-~B) = q2. For q2 > 0 the rule itself - without being invoked - results in a disbelief in the consequent B! Similar effects hold within the approach by Eddy and Pei [3]. Esp. a disbelief in B seems hard to justify, be- cause the rule only states sufficiency, not necess- ity, of A for B. Hence, all probability mass which is not assigned to the belief in a rule, will be commited to ignorance about it.

To conclude disbelief in a proposition, we re- quire a rule whose consequent is the negation of the proposition. For example, a proposition C might - with uncertainty - imply ~ B, i.e. besides A -* B there might be another rule C-*-~B. Not to violate the intuition behind implications, it seems reasonable to assume that it takes two production rules to conclude belief and disbelief in a proposition.

This immediately gives rise to the occurrence of contradictions, because a proposition and its nega- tion may both be concluded with a certain degree of belief. Dealing with contradictions will be an issue of evaluating inferences.

We assume that propositions which do not appear as consequences in any rule in a system of inferences are independent, meaning that their set-valued random variables are stochastically in- dependent. The independence may equivalently be stated without random variables. E.g. two proposi-

Page 4: About assessing and evaluating uncertain inferences within the theory of evidence

436 7". Kiimpke / About Assessing and Evaluating Uncertain Inferences

tions A and B may be called (evidentially) inde- pendent, iff

Bel Oa x OB = ext oB (PrOA Bel OA x OB X

ext o,4 (proB Bel oa × OB),

i.e. the joint belief in A and B is the orthogonal sum of the projected belief functions. Elementary applications of extensions and of • lead to

Lemma 2. Let A and B be independent proposi- tions. Then (a) BeI(A v B)=BeI (A) + B e i ( B ) -

BeI( A)BeI( B), BeI(--,(A v B))= BeI(--,A)BeI(~B).

(b) Bel(A ^ B)= BeI(A)BeI(B), Bel(--,(A A B)) = Bel(--,A) + Bei(--,B) - BeI(--,A)BeI(..-,B).

(E.g. A V B is a short notation for (A, B) V (~A, B) V (A, -~B) within the frame OA × eB.)

There is another difference between production systems with certainty and those with uncertainty. Consider for example 4 propositions A, B, C and D related by the rules R1: A --* (B v C), R 2: B ~ D , and R3: C-oD:

A - - - - ' ~ B v C

D

If facts and rules hold with certainty, then D holds whenever A holds, the rule A--, D can be derived. But if facts and rules contain uncertainty, rules R2 and R3 can not be invoked without further assumptions: the belief in B v C, implied by R1, does not tell to what degree there is belief in B resp. C alone. Even if B and C were known to be independent, there are infinitely many belief functions for B and C which lead to the given belief in B v C, comp. lemma 4. We will not pursue any questions of "completeness" here.

Furthermore, elementary calculations show that the equations of lemma 2 are extreme in the following sense.

Lemma 3. Let A and B be propositions, possibly dependent. Then (a) Bei(A A B) + Bel(A v B) > BeI(A) + BeI(B), (b) Bel(~(A A B)) + Bel(~(A V B)) > BeI(--,A) + BeI(...,B).

Rules with uncertainty are sometimes, as in [7], simplified to the form A1 ^ . . . ^An-* B, i.e. the antecedent is a conjunction of propositions while the consequent is a single one. A given 'rule such as A-~ (B A C) is substituted by the two rules A --* B and A - , C. This is of course reasonable within certain facts and rules. But it seems dif- ficult to decompose rules within uncertainty. It is not obvious how to assess the belief functions of ,t

the single rules from the given one. Moreover, it seems impossible to combine single beliefs to the joint one for B ^ C, because the single rules ha:,e the same antecedent, are thus dependent, and thus • is not the proper operation of combination. We will hence not decompose in the modelling process. However, whenever necessary, we will introduce propositions such as D "- B ^ C so that a given rule may formally be viewed as one having only a single proposition as consequent. This serves to simplify the terminology.

Lemma 4. Let B and C be independent proposi- tions whose disjunction has the known belief function B e l ( B v C) = p ~ (0, 1) and BeI(--,(Bv C)) = q ~ [0, 1), p + q < 1. Then there are infinitely many belief functions for B and C leading to the given one for B v C.

Proof. For p ~ (0, 1) select PB ~ (0, p) and set p c ' f f i ( p - p a ) / ( 1 - p a ) , q B ' f l - - P B , and qc "= q/qB = q/(1 - PB). The belief functions

Bel(B) = PB and Bel(C) = Pc

B e l ( ~ B ) = qs Bel(-~C) = qc

have - by lemma 2 - the desired property. Be- cause PB is arbitrary in (0, p), there are infinitely many of such functions.

3. Evaluating Uncertain Inferences

For the moment we assume that no contradic- tions may occur. Later on, this assumption will be dropped. Suppose a system of facts and rules without "cycles" is given, i.e. there are no chains leading back to a former fact such as A --* B -~ C - - ' ~ 1 . . - --~ A .

As proposed in [6], an inference system may be considered as a reliability network, a directed graph with propositions corresponding to nodes

Page 5: About assessing and evaluating uncertain inferences within the theory of evidence

T. Kiimpke / About Assessing anti Evaluating Uncertain Inferences 437

and rules to arcs. The reliability probability of an arc (node) is the belief in the corresponding rule (fact). Remember that the initial propositions A1 . . . . . An - those which are not consequences in any rule - were assumed to be independent. Then holds, see [6]:

Theorem 1. The belief in a proposition B is the reliabifity probabifity Pa of node B:

BeI ( B) =pro ( * • • . . .

O B e l R m ) ( B ) = P B ,

with { R1 , . . . , Rm } being the set of all rules of the system.

Of course the components of the orthogonal sum of theorem 1, which do not play a role for B, may be omitted, though it is generally difficult to tell what those components are. To simplify the calculations for a belief function of some proposi- tion, one might derive it from all its antecedents only. Note that the antecedents leading to a pro- position need not be evidentially independent. This is due to several antecedents possibly depending on the same proposition. We do not require the graph of the reliability network to be an intree in order to allow the same piece of evidence to be used in several antecedents. This is important from the applicational point of view. Reusing evidence may very well fit a domain expert's paradigm of drawing conclusions (such as in medical context). For example: the 4 propositions A, B, C and D may be related by the rules RI: A - * B , R2: C - * B , R3: B - * D , and R4: C - * D :

A,, ,~

If the belief in D is supposed to be calculated from the belief in B and C, then we must take into account the dependence of B and C. If for instance the belief in A and C is .9 and the belief in all 4 rules is .8, then the resulting belief in D is .946. If B and C are assumed to be independent, then the belief in D increases to .952. For more complex inference networks larger deviations have to be expected.

The evaluation of the belief in a fact may generally be done by assigning each fact A resp. each rule R to a binary random variable XA resp.

XR, where e.g. Xa = 0 means that fact A does not hold, and X R = 1 means that rule R holds, if its antecedent is valid. This formulation of rules dif- fers from the MIP (mixed integer programming) representation of clauses from propositional logic as given by e.g. Jeroslov [5] or Hooker [4]. Each rule is assigned a variable of its own, it is not made up from the variables of the involved pro- positions.

Thus, as known from reliability theory, the validity of a proposition can be expressed as an increasing function of the underlaying random variables. For instance, in the last example we get XD = max{ Xa " XR3, X C • XR4 } with XB = max{ X A • Xm, Xc" XR2 }. The belief in D is then P ( X o - 1 ) . ..

In general, if a proposition B is implied by several rules RI: A1 -* B , . . . , Rk: A k -* B, then

X~ = max { XA, " Xm . . . . . ,YAk" XRk }.

In principle, the calculation of P(XB = 1) can be done: the distribution of Xa~ resp. Xm is given by P(Xa~ = 1) resp. P ( X m = 1) being the belief in proposition A i resp. rule Ri. All rules are inde- pendent, the difficulty of the evaluation lies in possible dependencies of the propositions and the possibly large number of antecedents having to be evaluated first.

To simplify the calculations, the dependencies might be neglected always leading to an over- estimation, i.e. a too optimistic estimation, of the true value, comp. the last example.

The result is based on the association of ran- dom variables, see e.g. [2] for the definition and the next lemma.

Definition• A set of real-valued random variables X1,... , X~ is called associated, iff for all non-de- creasing functions, f, g: R n - * R holds: Cov( f ( x l , . . . . x . ) , g( . . . . xn)) >__ o.

Two important properties of associated random variables are the following.

Lemma 5. ( a ) I f X 1 . . . . . Xn are independent, then X l , . . . , Xn are associated. (b) I f X1, . . . , Xn are associated, then f i ( X l , . . . , X~) are associated for arbitrary nondecreasing functions f~: R" -* R, i = 1 , . . . , m ~ N. ( c) I f X1, . . . , Xn are associated, then for all t ~ R : P ( X 1 < t , . . . . X~ < t ) > _ P ( X l < t ) • . . . . P ( X ~ < t ) .

Page 6: About assessing and evaluating uncertain inferences within the theory of evidence

438 7". Kiimpke / About Assessing and Evaluating Uncertain Inferences

Lemma 6. I f the initial propositions of an in- ference system are independent, then for any set of propositions A1, . . . . Ak the random variables (a) Xm, . . . , XAk and (b) Xal XR1,..., )(At, XRk are as- sociated.

the association of all the variables XA1, . . . . XAk: the efficiency of the simulation can be increased by antithetic sampling. All 0-1 variables U can be used twice by forming XB,j from 1 - U as well, denoted by Xn,;,ant- The estimator

Proof. Trivial by lemma 5, since all variables Xal , . . . , Xak resp. XAI " X m , . . . , Xak" Xm, are nondecreasing functions of the random variables of the initial propositions and the involved rules.

All together this gives the next theorem.

Theorem 2. BeI(B) - P( X s = 1) < P * ( X n = 1), with P* being the measure for independent antece- dents.

Proof. By the independence of the rules from the propositions and the independence of all the rules from each other, we obtain

P ( X n - 1 )

= P(max{ Xal" Xm, . . . , Xak" ARk } = 1)

= 1 - P(max{ Xa,. Xm,. . . , Xak" Xm, } -- O)

= 1 - xR, < o , . . . , x k. < o)

< 1 - < o ) . . . . , e(x k. < o ) ,

the inequality holding by lemmas 5 and 6. Clearly, the last term is equal to P * ( X n - 1).

Simulation of an inference network is another way to evaluate it. All random variables Xs resp. XA for initial propositions will be realized. The validity of all propositions one is interested in can be evaluated deterministically. This procedure is repeated independently, say N times, and for some proposition B of interest Bel(B) - P ( X n - 1) is approximated by

1 / N . ~ l~j~t~Xs.;,

where Xn, j is the validity of proposition B in run Y.

All random variables X a resp. Xs, which have to be drawn, will be transformed from indepen- dent 0-1 uniform random variables U. For in- stance for A" X a = 1, if U > 1 - BeI(A), 0 other- wise. Thus, P ( X a = l ) - l - ( 1 - Be l (A) )= BeI(A) and X a is increasing in U. The random variable X n is increasing in all underlaying 0-1 random variables. This allows for another use of

1 / S . E I < j < N ( XB,j "}- XB,j,ant)/2

can be shown, see e.g. [8] to have a smaller vari- ance than the estimator given above, even if the above estimator is based on 2N instead of N simulation runs.

We will now allow inference systems with pro- positions having negations such as the following: suppose there are 3 propositions A, B, and C related by the rules RI: A ~ C, R2: B ~ C, and R3: ~ B ~ --,C:

A " " ~ c B ~ f ~ - ~ C -~B

If both A and -~ B are true and both the rules R1 and R3 hold, then there is the contradiction C A -,C. Contradictions may of course be implied without the initial propositions being contradic- tory.

When evaluating such inference systems - by the definition of • - all belief implied is calcu- lated in the same way as in systems without nega- tions, with one exception: the probabilities calcu- lated for a proposition have to be normalized by the probability that there is no contradiction for this proposi t ion- 1 - probability that there is a contradiction. Note that contradictions, i.e. the simultaneous validity of a proposition e.g. C and its negation ~ C, ar given by fS. When forming the orthogonal sum, these are exactly the cases being counted in the denominator, see above.

In these situations, simulation again is a means to evaluate inference systems. However, some of the simulation runs have to be rejected: the runs in which there are contradictions must not be counted. On the other hand, not all probability mass assigned to the belief in a proposition can be assigned to its negation. It may well be that no conclusion about a proposition can be reached by e.g. failure of some rules. This is essential to belief functions. Thus, when evaluating the belief func- tion of a proposition C, we have to take into account explicitely at least two out of three cases: C, ~ C and OC. For each proposition C we will

Page 7: About assessing and evaluating uncertain inferences within the theory of evidence

T. Kiimpke / About Assessing and Evaluating Uncertain Inferences 439

hence introduce two binary random variables X c and X c describing the validity of C and --,C, where e.g. Xc--- X~c-- 1 denotes a contradiction. Thus, Bel (C)= P(Xc= 1, X c=O)/P(X c + X~c < 1). If an inference system is simulated N times, we approximate Bel(C) by

(number of runs with Xc= 1 and X c = O)/N

/ ( number of runs with Xc + X~c <- 1) /N

= (number of runs with X c = 1 and X c = O)

/ (number of runs with X c + X~c < 1).

Antithetic sampling can also be applied in these situations. For some proposition C, whose belief is to be evaluated, we define f ( X o X~c): = 1, if X c = 1 and X c = O, 0 otherwise. Na~ c denotes the number of accepted runs out of N trials. Xc , j (Xc j ) denotes the outcome of Xc(X~c) in run j . Instead of estimator

1//Nacc" E f ( Xc.j, X-cj), j ~ Acc

we can use the antithetic est imator

1/Na~c" Y'~ f ( Xc,j, X~c , j ) /2 + 1/Na~c,~nt j~Acc

• ~, f ( Xc,j,ant, X~c,j,ant)//2, j ~ Acc,ant

where Acc is the set of accepted simulation runs out of {1, . . . , N } and all terms with index "ant" are those from the antithetic runs; N,,~c = I Accl.

Lemma 7. The estimator and the antithetic estima- tor are unbiased.

Proof. For the es t imator 1/Nacc'Ej~Acc f(Xc.j, X~c.j) only. The antithetic estimator is dealt with in the same way.

E ( 1 / N a c c E f ( x c , j , X~c,J)) j ~ Ace

= E e ( 1 / k E f (Xc, j ' X~c.J) 1 <k<N j e Ace, lAPel •k

INaccfk)'P(Nacc =k )

= ~ 1/k" ~., E ( f (Xc . j ,X~c . j ) l<k<N l<j<k

INacc=k)'P(Nacc =k )

= E E ( f ( X c , X ~ c ) I X c + X ~ c < I ) l <k <_N

• e (Nacc= k)

= E ( f ( Xc, X~c) l Xc + X~c < l)

=e(x¢=l, x ¢=01 x¢+ x c_< 1)

= B e l ( C ) .

Note that Nac c = k is short for: there are exactly k indices j ~ {1 . . . . , N } with Xc. j + X c.j <_ 1.

The monotonicity of f ensures variance reduc- tion of the antithetic estimator over the unrefined one. Moreover, empirical results showed that in inference systems with contradictions the reduc- tion of variance was greater than in systems without.

Taking all into account we conclude that the Dempster-Shafer theory, at least when applied to inference systems, appears to be riot too far away from "classical" Bayes theory. Methods from probability theory may well serve the theory of e~idence.

References

[1] Barnett, J.A.: Computational methods for a mathematical theory of evidence, Proc. Int. Joint Conf. on Artif. Inteil., 1981, p. 868-875.

[2] Barlow, R.E., Proschan, F.: Statistical theory of reliability and life testing, To Begin With, Silver Spring, MD, 1981.

[3] Eddy, W.F., Pei, G.P.: Structure of rule based belief functions, IBM J. Res. Develop. 30, 1986, p. 93-101.

[4] Hooker, J.N.: A quantitative approach to logical in- ference, Dec. Sup. Syst. 4, 1988, p. 45-69.

[5] Jeroslov, R.: Spatial imbeddings for linear and for logic structures, Dec. Sup. Syst. 4, 1988, p. 71-86.

[6] Kohlas, J.: Conditional belief structures, report 131, Uni- versity of Fribourg/CH, Institute for Automation and OR, 1987.

[7] Lee, S., Shin, K.G.: Uncertain inference using belief func- tions, Proc. 3rd Conf. Artif. Intell. Appl., Kissimmec, FL, 1987, p. 238-243.

[8] Ross, S.M.: Introduction to probability models, Academic Press, Orlando, FL, 1985.

[9] Shafer, G.: A mathematical theory of evidence, Princton University Press, Princton, 1976.

[10] Shafer, G., Shenoy, P.P., Mellouli, K.: Propagating belief functions in qualitative Markov trees, working paper, School of Business, University of Kansas, 1986.