Towards a Unifying Theory of Logical and Probabilistic ... · Towards a Unifying Theory of Logical and Probabilistic Reasoning Rolf Haenni ... Abstract Logic and probability theory

4th International Symposium on Imprecise Probabilities and Their Applications, Pittsburgh, Pennsylvania, 2005

Towards a Unifying Theory of Logical and Probabilistic Reasoning

Rolf HaenniUniversity of Berne

CH-3012 Berne, [email protected]

Abstract

Logic and probability theory have both a long his-tory in science. They are mainly rooted in philosophyand mathematics, but are nowadays important toolsin many other fields such as computer science and, inparticular, artificial intelligence. Some philosophersstudied the connection between logical and probabilis-tic reasoning, and some attempts to combine thesedisciplines have been made in computer science, butlogic and probability theory are still widely consid-ered to be separate theories that are only looselyconnected. This paper introduces a new perspectivewhich shows that logical and probabilistic reasoningare no more and no less than two opposite extremecases of one and the same universal theory of reason-ing called probabilistic argumentation.1

Keywords. Logical reasoning, probabilistic reason-ing, uncertainty, ignorance, argumentation.

1 Introduction

Guessing the outcome of tossing a coin, given that thecoin is known to be fair, is certainly different fromguessing the outcome of tossing an unknown coin.Both are situations of uncertain reasoning, but withrespect to the available information, the second oneis less informative. We will refer to it as a situationof uncertain reasoning under (partial) ignorance orambiguity .2

Initially, we may judge the two possible outcomes(head or tail) to be equally likely in both situations,but in the case of an unknown coin, one should beprepared to revise the judgment upon inspecting the

1 This paper is an extended version of a paper accepted atthe ECSQARU’05 conference [9]. It has been adapted for theparticular interests of the SIPTA community.

2 Some authors define ambiguity as the “uncertainty aboutprobability, created by missing information that is relevant andcould be known” [1, 4]. In this paper, we prefer to talk aboutignorance as a general term for missing information.

coin or doing some experiments. As a consequence, ifasked to bet money on tossing an unknown coin, peo-ple tend to either ask to see the coin or reject the bet,whereas betting on a fair coin is usually considered tobe “safe”. To know that the coin is fair provides thusa more solid ground for conclusions or decisions.

Classical probabilistic approaches to uncertain rea-soning are based on the Bayesian probability inter-pretation, in which additive probabilities are used torepresent degrees of belief of rational agents in thetruth of statements. The problem is that probabilitytheory, if applied in the classical way, cannot conveyhow much evidence one has. In the coin tossing exam-ple, this means that the same probability 1

2 is assignedin all cases, although the two situations are not thesame at all.

To remedy this problem, a number of techniqueshas been developed, e.g. interval-valued probabili-ties [18, 19, 28, 32], second-order probabilities [8, 17],imprecise probabilities [29, 30, 31], Dempster-ShaferTheory [3, 25], the Transferable Belief Model [27], theTheory of Hints [15], subjective logic [12], and manymore. What most of these approaches have in com-mon is that evaluating the truth of a statement yieldstwo values, not just a single one like in classical prob-ability theory. Depending on the context, the firstvalue is either a lower probability bound , a measureof belief , or a degree of support , respectively, whereasthe second one is either an upper probability bound ,a measure of plausibility , or a degree of possibility .In all cases, it is possible to interpret the differencebetween the two values (i.e. the length of the corre-sponding interval) as a measure of ignorance relativeto the hypothesis in question.

Alternatively, one can understand the two values toresult from sub-additive probabilities of provability (orepistemic probabilities) [2, 7, 21, 23, 22, 26], for whichthe two respective values for a hypothesis and its com-plement do not necessarily sum up to one, i.e. wherethe ignorance is measured by the remaining gap.

1.1 The Basic Idea

This paper will further explore the “probability ofprovability” point of view. The idea is very simple.Consider two complementary hypothesis H and Hc,and let E denote the given evidence (what is knownabout the world). Instead of looking at posteriorprobabilities P (H|E) and P (Hc|E), respectively, wewill look at the respective probabilities that H andHc are known or provable in the light of E.3 Notethat under certain circumstances the evidence E maynot be informative enough to prove either of them,i.e. we need to consider posterior probabilities of theevents “H is provable”, “Hc is provable”, and “Nei-ther of them is provable”. Since these alternatives areexclusive and exhaustive, it is clear that

P (H is provable|E) +P (Hc is provable|E) +P (Neither of them is provable|E) = 1.

Furthermore, since P (Neither of them is provable|E)may be positive, which implies P (H is provable|E) +P (Hc is provable|E) ≤ 1, it becomes clear wherethe sub-additive nature of probabilities of provabil-ity comes from. It is a simple consequence of apply-ing classical probability theory to a special class ofso-called epistemic events [23, 22].

To illustrate this simple idea, suppose your friendSheila flips a fair coin and promises to organize a partytomorrow night provided that the coin lands on head .Sheila does not say anything about what she is do-ing in case the coin lands on tail , i.e. she may ormay not organize the party. What is the probabilityof provability that there will (or will not) be a partytomorrow night?

The given evidence E consists of two pieces. The firstone is Sheila’s promise, which may be appropriatelyencoded by a propositional sentence head → party.The second one is the fact that the two possible out-comes of tossing a fair coin are known to be equallylikely, which is encoded by a uniform distributionP (head) = P (tail) = 1

2 .

In this simple situation, we can look at head as adefeasible or hypothetical proof (or an argument) forparty, and because it’s the only such proof, we con-clude that P (party is provable|E) = P (head) = 1

2 .This means that the chance of being in a situationin which the party is certain to take place is 50%.On the other hand, because the given evidence doesnot allow any conclusions about ¬party, there is no

3 We will later give a precise definition of what is meantwith “the probability that H (or Hc) is provable given E”. Anillustrative example is given below.

hypothetical proof for ¬party, and we conclude thatP (¬party is provable|E) = 0. It means that noth-ing speaks against the possibility that the party takesplace, or in other words, the party is entirely plausibleor possible.

1.2 Goals and Overview

This paper introduces a new theory of uncertain rea-soning under ignorance that deals with probabilitiesof provability. We will refer to it as the theory ofprobabilistic argumentation, in which probabilities ofprovability are alternatively called degrees of sup-port . It turns out that probabilistic argumentationincludes the two classical approaches to formal rea-soning, namely logical and probabilistic reasoning, asspecial cases. To show how to link or unify logical andprobabilistic reasoning is one of the primary goals ofthis paper. The key issue is to realize that the prov-ability or the probability of a hypothesis are both de-generate cases of probabilities of provability. They re-sult as opposite extreme cases, if one considers differ-ent sets of so-called probabilistic variables (see Sect. 4for details).

As mentioned at the beginning, the motivation behindprobabilistic argumentation and the idea of using tworather than one value to measure uncertainty is sim-ilar to many other approaches. The connection toDempster-Shafer theory, for example, has been dis-cussed in [11], and the connection between Dempster-Shafer theory and interval-valued or imprecise proba-bilities is well known [29]. Other connections need tobe explored, but this is not the topic of this paper.

The organization of this paper is bottom-up. We willstart with discussing the basic concepts and proper-ties of logical and probabilistic reasoning in Sect. 2and 3, respectively. The goal is to emphasize the sim-ilarities and differences between these distinct typesof formal reasoning. Probabilistic argumentation isintroduced in Sect. 4. We will demonstrate that logi-cal and probabilistic reasoning are contained as spe-cial cases. Finally, based on the notion of degrees ofsupport, Sect. 5 describes uncertainty and ignoranceas orthogonal aspects of our limited knowledge aboutthe world.

2 Logical Reasoning

Logic has a long history in science dating back to theGreek philosopher Aristotle. For a long time, logic hasprimarily been a philosophical discipline, but nowa-days it is also an important research topic in mathe-matics and computer science. The idea is to expressknowledge by a set of sentences Σ which forms the

knowledge base. The set L of all possible sentences isthe corresponding formal language.

If another sentence h ∈ L represents a hypothesis ofinterest, we may want to know whether h is a logicalconsequence of Σ or not. Formally, we write Σ |= h ifh logically follows from Σ, and Σ 6|= h otherwise. Inother words, logical reasoning is concerned with theprovability of h with respect to Σ.

If ¬h denotes the complementary hypothesis of h (thesentence ¬h is true whenever h is false and vice versa),then it is possible to evaluate the provability of ¬h in-dependently of the provability of h. As a consequence,we must distinguish the following four cases:4

• LI: Σ 6|= h, Σ 6|= ¬h,⇒ Σ does not allow any conclusions about h;

• LT: Σ |= h, Σ 6|= ¬h,⇒ h is true in the light of Σ, i.e.¬h is false;

• LF: Σ 6|= h, Σ |= ¬h,⇒ h is false in the light of Σ, i.e.¬h is true;

• LC: Σ |= h, Σ |= ¬h,⇒ the knowledge base is inconsistent or contra-dictory, i.e. Σ |= ⊥.

Note that a definite answer is only given for LT or LF,whereas LI corresponds to a situation in which we areignorant with respect to h. This means that the avail-able information is insufficient to prove either h or itscomplement ¬h. In the particular case of Σ = ∅, thishappens for all possible hypotheses h 6= >. It reflectsthus an important aspect of logical reasoning, namelythat the absence of information does not allow mean-ingful conclusions. Finally, LC appears only if theknowledge base Σ is inconsistent . Inconsistent knowl-edge bases are usually excluded, and we will thereforeconsider them as invalid .

Suppose now that we are interested in a graded mea-sure of provability and let’s call it degree of supportdsp(h) ∈ [0, 1] ∪ {undefined}. Intuitively, the ideaof dsp(h) is to express quantitatively the strength inwhich the available knowledge supports h. Providedthat Σ is consistent, the support is maximal if h is alogical consequence of Σ, and it is minimal if Σ doesnot allow to infer h. In the light of this remark, wecan define degree of support for logical reasoning inthe following way:

dsp(h) :=

0, if Σ 6|= h and Σ 6|= ⊥,

1, if Σ |= h and Σ 6|= ⊥,

undefined, if Σ |= ⊥.

(1)

4 L stands for “Logical Reasoning”, T for “True”, F for“False”, I for “Ignorant”, and C for “Contradictory”.

To further illustrate this, consider a set V = {X, Y }of variables, corresponding sets ΘX and ΘY of pos-sible values, and an appropriate formal language LV

that includes statements about the variables X andY . Vectors x = 〈x, y〉 ∈ NV with NV = ΘX×ΘY

are then the possible interpretations of the languageLV , and logical reasoning can be understood in termsof sets of interpretations instead of sets of sentences.With NV (ξ) ⊆ NV we denote the set of all interpreta-tions for which the sentence ξ ∈ LV (or all sentencesof a set ξ ⊆ LV ) is true. The elements of NV (ξ)are also called models of ξ. Instead of Σ and h wewill now work with the corresponding sets of modelsE := NV (Σ) and H := NV (h), respectively. By do-ing so, the consequence relation Σ |= h translates intoE ⊆ H. This allows us to define degree of support forlogical reasoning in the following form:

dsp(H) :=

0, if ∅ 6= E 6⊆ H,

1, if ∅ 6= E ⊆ H,

undefined, if E = ∅.(2)

With such a graded measure of provability in mind,we can think of the four different cases of logical rea-soning according to Fig. 1. The inconsistent case LCis shown outside the unit square to indicate that de-gree of support is undefined.

LTLI

dsp(Hc)

dsp(H)

Σ |= ⊥

0 1

0

1

LCLF

Figure 1: The four cases of logical reasoning.

Note that logical reasoning requires a sub-additivemeasure of provability, because LI is characterizedby dsp(H) = dsp(Hc) = 0, which implies dsp(H) +dsp(Hc) = 0. Hence, sub-additivity turns out to be anatural property of logical reasoning, whereas proba-bilistic reasoning declines it (see Sect. 3).

Another important remark is to say that logical rea-soning is monotone with respect to the availableknowledge. This means that adding new sentences toΣ will never cause a transition from LT to LF or viceversa, i.e. if something is known the be true (false), itwill never become false (true). The complete transi-tion diagram for logical reasoning is shown in Fig. 2.

0 1

0

1

Figure 2: Logical reasoning is monotone.

Note that human reasoning is not monotone at all.In fact, monotonicity is considered to be one of themajor drawbacks of logical reasoning. This has led tothe development of numerous non-monotone logics.

3 Probabilistic Reasoning

The history of probability theory is not as old as thehistory of logic, but it also dates back to the 17thand 18th century. The goal is to measure the possibletruth of a hypothesis H by a corresponding posteriorprobability

P ′(H) = P (H|E) =P (H ∩ E)

P (E)(3)

for some observed evidence E. P ′(H) = 1 meansthat H is true in the light of E, whereas P ′(H) = 0implies that H is false. Intermediate values between0 and 1 represent all possible graded judgments be-tween the two extremities. Probability theory isbased on Kolomogorov’s axioms [16], which, amongother things, stipulates additivity . This means thatP ′(H) + P ′(Hc) = 1 for all hypotheses H.

Usually, probabilistic reasoning starts with a multi-variate model that consists of a set of variables V anda prior probability distribution P(V ). A vector x thatassigns a value to every variable of V is called atomicevent . The hypothesis H and the evidence E are setsof such atomic events. With NV we denote the setof all atomic events. P(V ) can thus be regarded asan additive mapping that assigns values P (x) ∈ [0, 1]to all elements x ∈ NV .5 With P′(V ) = P(V |E)we denote the corresponding posterior distribution forsome given evidence E 6= ∅. Note that P ′(x) = 0 forall atomic events x /∈ E. The posterior distribution isundefined for E = ∅.

5 If some or all variables of V are continuous, then densityfunctions are needed to specify the prior distribution. Thediscussion in this paper will be limited to discrete variables,but this is not a conceptual restriction.

Prior distributions are often specified with the aid ofBayesian networks [21], in which the variables of themodel correspond to the nodes of a directed acyclicgraph. The benefits of Bayesian networks are at leasttwofold: first, they provide an economical way of spec-ifying prior distributions for large sets V , and second,they play a central role for computing posterior prob-abilities efficiently.

For a given prior distribution P(V ), it is possible tocompute prior probabilities P (H) by

P (H) =∑x∈H

P (x). (4)

Similarly, corresponding posterior probabilities areobtained for E 6= ∅ by

P ′(H) =∑x∈H

P ′(x) = k ·∑

x∈H∩E

P (x). (5)

With k = P (E)−1 we denote the normalization con-stant. For the particular case of V = {X, Y }, Fig. 3illustrates the relationship between prior and poste-rior distributions.

E

P′(X,Y ) = P(X,Y |E) P(X,Y )

X

Y

H

Figure 3: The relationship between prior and poste-rior distributions.

In order to make an analogy to the discussion of log-ical reasoning in Sect. 2, we can also distinguish fourcases of probabilistic reasoning:6

• PU: E 6= ∅, 0 < P ′(H) < 1,⇒ P(V ) and E do not allow a final judgment;

• PT: E 6= ∅, P ′(H) = 1,⇒ H is true in the light of P(V ) and E;

• PF: E 6= ∅, P ′(H) = 0,⇒ H is false in the light of P(V ) and E;

6 P stands for “Probabilistic Reasoning”, T for “True”, Ffor “False”, U for “Uncertain”, and C for “Contradictory”.

• PC: E = ∅, P ′(H) = undefined,⇒ the evidence is invalid.

Note that there is a one-to-one correspondence be-tween the cases PT and LT, PF and LF, and PC andLC. In the sense that both PU and LI do now allowa final judgment with respect to H, they can also beconsidered as similar cases. But they differ from thefact that LI, in contrast to PU, represents a state oftotal ignorance.

To further illustrate the analogies and differences be-tween logical and probabilistic reasoning, all possiblepairs of values P ′(H) and P ′(Hc) are shown in thepicture of Fig. 4. In accordance with the discussionabout logical reasoning in Sect. 2, posterior probabili-ties P ′(H) and P ′(Hc) will now be considered as (ad-ditive) degrees of support dsp(H) and dsp(Hc), re-spectively.

As a consequence of the additivity requirement andfor E 6= ∅, all possible pairs dsp(H) and dsp(Hc)lie on the diagonal between the upper left and thelower right corner of the unit square shown in Fig. 4.Case LI and logical reasoning in general are thusnot covered by probabilistic reasoning. On the otherhand, logical reasoning does not include PU. That’sthe fundamental difference between logical and prob-abilistic reasoning.

PF

PT

PU

PCdsp(Hc)

dsp(H)

0 1

0

1E = ∅

Figure 4: The four cases of probabilistic reasoning.

In comparison with logical reasoning, an importantbenefit of probabilistic reasoning is the fact that itis non-monotone with respect to the available knowl-edge. Except for dsp(H) = 0 and dsp(H) = 1, degreeof support may thus increase or decrease when newevidence is added. As a consequence, even if a hy-pothesis is almost perfectly likely to be true (false), itmay later turn out to be false (true). Figure 5 illus-trates the possible transitions of degrees of support inthe context of probabilistic reasoning.

0 1

0

1

Figure 5: Probabilistic reasoning is non-monotone.

4 Probabilistic Argumentation

So far, we have tried to point out the similarities anddifferences between logical and probabilistic reason-ing. In a nutshell, logical reasoning is sub-additiveand monotone, whereas probabilistic reasoning is ad-ditive and non-monotone. Our goal now is to define amore general formal theory of reasoning that is sub-additive and non-monotone at the same time. In otherwords, the idea is to build a common roof for logicaland probabilistic reasoning. We are thus interested insub-additive and non-monotone degrees of support,which means that the judgment of a hypothesis mayresult in any point within the triangle shown in Fig. 6.

dsp(Hc)

dsp(H)

0 1

0

1Σ |= ⊥

Figure 6: Sub-additive and non-monotone degrees ofsupport.

In order to build such a unifying theory, we mustfirst try to better understand the origin of the dif-ferences between logical and probabilistic reasoning.The key point to realize is the following: probabilisticreasoning presupposes the existence of a probabilitydistribution over all variables, whereas logical reason-ing does not deal with probability distributions at all,i.e. it presupposes a probability distribution over noneof the variables involved. If we call the variables overwhich a probability distribution is known probabilis-tic (or exogenous), we can say that a probabilisticmodel consist of probabilistic variables only, whereas

all variables of a logical model are non-probabilistic(or endogenous). From this point of view, the maindifference between logical and probabilistic reasoningis the number of probabilistic variables. This simpleobservation turns out to be crucial for understand-ing the similarities and differences between logical andprobabilistic reasoning.7

With this remark in mind, building a more generaltheory of reasoning is straightforward. The idea isto allow an arbitrary number of probabilistic vari-ables. More formally, if V = {X1, . . . , Xn} is theset of all variables involved, we will use A ⊆ V todenote the subset of probabilistic variables. P(A)is the corresponding prior distribution over A. Inthe “Party Example” of Sect. 1, we can think of twoBoolean variables Coin and Party with a given (uni-form) prior distribution over Coin. This impliesV = {Coin, Party} and A = {Coin}.

Clearly, logical reasoning is characterized by A = ∅and probabilistic reasoning by A = V , but we arenow interested in the general case of arbitrary sets ofprobabilistic variables. We will refer to it as the the-ory of probabilistic argumentation.8 The connectionbetween probabilistic argumentation and the classicalfields of logical and probabilistic reasoning is depictedin Fig. 7.

Percentage ofProbabilisticVariables100%0%

Logic Probability Theory

Probabilistic Argumentation

Figure 7: Different sets of probabilistic variables.

In the general context of arbitrary sets of probabilis-tic variables, the goal is to define degree of support asa sub-additive and non-monotone measure of uncer-tainty and ignorance. We suppose that our knowledgeor evidence is encoded by a set of sentences Σ ⊆ LV ,which determines then a set of possible atomic eventsE := NV (Σ) ⊆ NV . It is assumed, that the true stateof the world is exactly one element of E. A quadrupleA = (V,A,P(A),Σ) is called probabilistic argumenta-tion system. In the following, we will focus on sets Eof atomic events rather than sets Σ of sentences. Ac-cordingly, we will consider hypotheses H ⊆ NV ratherthan corresponding sentences h ∈ LV .

7 The literature on how to combine logic and probabilityis huge, but the idea of distinguishing probabilistic and non-probabilistic variables seems to be a new one.

8 Previous work on probabilistic argumentation focuses onpropositional languages LP , see [14, 10, 11].

The definition of degree of support will be based ontwo observations. The first one is the fact that theset E ⊆ NV , which restricts the set of possible atomicevents relative to V , also restricts the atomic eventsrelative to A. The set of all atomic events with re-spect to A is denoted by NA. Its elements s ∈ NA

are called scenarios. P (s) denotes the correspond-ing prior probability of a scenario s. By projecting Efrom NV to NA, we get the set E↓A ⊆ NA of possi-ble scenarios that are consistent with E. This meansthat exactly one element of E↓A corresponds to thetrue state of the world. All other scenarios s 6∈ E↓A

are impossible. This allows us to condition P(A) onE↓A, i.e. we need to replace the prior distributionP(A) by a posterior distribution P′(A) = P(A|E↓A).This is illustrated in Fig. 8 for the particular case ofV = {X, Y } and A = {X}.

E

P′(X) = P(X|E↓{X}) P(X)

X

Y

Figure 8: Prior and posterior distributions over prob-abilistic variables.

The second observation goes in the other direction,that is from NA to NV . Let’s assume that a certainscenario s ∈ NA is the true scenario. This reduces theset of possible atomic events with respect to V fromE to

E|s := {x ∈ E : x↓A = s}. (6)

Such a set is called conditional knowledge base or con-ditional evidence given the scenario s. It contains allatomic events of E that are compatible with s. Thisidea is illustrated in Fig. 9 for V = {X, Y }, A = {X},and three scenarios s0, s1, and s2. Note that we haveE|s 6= ∅ for every consistent scenario s ∈ E↓A, e.g.for s1 and s2. Otherwise, as in the case of s0, we haveE|s = ∅

Consider now a consistent scenario s ∈ E↓A for whichE|s ⊆ H. This means that H is a logical consequenceof s and E, and we can thus see s as a defeasibleor hypothetical proof for H in the light of E. Wemust say defeasible, because it is unknown whether sis the true scenario or not. In other words, H is only

E H

s2s1s0

E|s1

E|s2

X

Y

Figure 9: Evidence conditioned on various scenarios.

supported by s, but not entirely proved. The set of allsupporting scenarios is denoted by

SP (H) := {s ∈ E↓A : E|s ⊆ H}= {s ∈ NA : ∅ 6= E|s ⊆ H}. (7)

In the example of Fig. 9, the hypothesis H is sup-ported by s2, but not by s0 or s1 (s0 is inconsistent).Note that SP (∅) = ∅ and SP (NV ) = E↓A. Some-times, the elements of SP (H) and SP (Hc) are alsocalled arguments and counter-arguments of H, respec-tively. This is where the name of this theory originallycomes from.

The set of supporting scenarios is the key notion forthe definition of degree of support. In fact, becauseevery supporting scenario s ∈ SP (H) contributes tothe possible truth of H, we can measure the strengthof such a contribution by the posterior probabilityP ′(s), and the total support for H corresponds to thesum

dsp(H) := P ′(SP (H)) =∑

s∈SP (H)

P ′(s) (8)

over all elements of SP (H). Note that dsp(H) de-fines an ordinary (additive) probability measure in theclassical sense of Kolmogorov, that is P ′(SP (H)) +P ′(SP (H)c) = 1 holds for all H ⊆ NV . However, be-cause the sets SP (H) and SP (Hc) are not necessarilycomplementary, we have dsp(H)+dsp(Hc) ≤ 1 as re-quired. Degrees of support should therefore be under-stood as sub-additive posterior probabilities of prov-ability . Provided that E 6= ∅, they are well-defined forall possible hypotheses H ⊆ NV , that is even in casesin which the prior distribution P(A) does not cover allvariables. This is a tremendous advantage over clas-sical probabilistic reasoning, which presupposes theexistence of a prior distribution over all variables.

An alternative to considering degrees of supportof complementary hypotheses is to define so-calleddegrees of possibility by dps(H) := 1 − dsp(Hc).

Hypotheses are then judged by a pairs of valuesdsp(H) ≤ dps(H). Note that there is a strong connec-tion to Bel(H) and Pl(H) in the context of Dempster-Shafer Theory [11].

To complete this section, we will briefly investigatehow the classical fields of logical and probabilistic rea-soning fit into this general theory of probabilistic ar-gumentation.

Logical Reasoning is characterized by A = ∅. Thishas a number of consequences. First, it implies thatthe set of possible scenarios NA = {〈〉} consists of asingle element 〈〉, which represents the empty vector ofvalues. This means that P (〈〉) = 1 is the only possibleprior distribution. Second, if we assume E 6= ∅, we getE↓A = {〈〉} = NA and thus P ′(〈〉) = 1. Furthermore,we have E|〈〉 = E, which allows us to rewrite (7) as

SP (H) =

{{〈〉}, for ∅ 6= E ⊆ H,

∅, otherwise.(9)

Finally, P (〈〉) = 1 implies dsp(H) = 1 for ∅ 6= E ⊆ H,dsp(H) = 0 for ∅ 6= E 6⊆ H, and dsp(H) = undefinedfor E = ∅. This corresponds with the definition ofdegree of support for logical reasoning in (2).

Probabilistic Reasoning is characterized by A =V . This means that the sets NA and NV are identical,and it implies

E|s =

{{s}, for s ∈ E,

∅, otherwise.(10)

From this it follows that SP (H) = H∩E, and it allowsus to rewrite (8) as

dsp(H) =∑

s∈H∩E

P ′(s) =∑s∈H

P ′(s), (11)

which corresponds to the definition of for degrees ofsupport (posterior probabilities) in the case of proba-bilistic reasoning, see (5).

5 Uncertainty vs. Ignorance

The previous section has pointed out the crucial roleof the probabilistic variables in the context of prob-abilistic argumentation. We can consider them asrepresentatives for atomic sources of uncertainty forwhich further details are supposed to be inaccessi-ble. The corresponding prior distribution P(A) sum-marizes or quantifies the uncertainty stemming fromthem. In other words, the best we can expect to knowabout an atomic source of uncertainty is a prior dis-tribution. Tossing a fair coin, for example, is such anatomic source of uncertainty. It is a complex physical

process whose exact details are inaccessible. But withP (head) = P (tail) = 1

2 it is possible to summarize theuncertainty involved.

As a consequence of using prior probabilities as a ba-sic ingredient of probabilistic argumentation, the bestwe can expect from evaluating a hypothesis H are ad-ditive degrees of support for H and Hc, respectively.Such a situation corresponds to the case of ordinaryposterior probabilities. Because we cannot expectmore than this, we can say that our ignorance with re-spect to H is minimal for dsp(H)+dsp(Hc) = 1 (truefor PU, PT, PF, LT, and LF). On the other hand,dsp(H) = dsp(Hc) = 0 reflects a situation in whichour ignorance with respect to H is maximal (true forLI), that is H as well as Hc are totally unsupportedby the given knowledge. Note that ignorance is al-ways relative to a hypothesis: it may well be that theavailable knowledge is very informative with respectto a hypothesis H, whereas another hypothesis H ′ isnot affected at all. In the light of these remarks, wecan formally define degree of ignorance by

dig(H) := 1− dsp(H)− dsp(Hc), (12)

which allows to characterize PU, PT, PF, LT, and LFby dig(H) = 0 and LI by dig(H) = 1. Note that thethe evaluation of a hypothesis H may result in anyintermediate degree of ignorance. We can thus usedig(H) to gradually measure the amount of the avail-able knowledge that is relevant for the evaluation ofH. In this sense, dig(H) also quantifies various levelsof more or less complete knowledge bases. Figure 10illustrates this idea, i.e. degree of ignorance is twicethe distance d (marked by arrows) between the twoparallels.

dsp(Hc)

dsp(H)

0 1

0

1! |= !

d

Figure 10: Various levels of ignorance.

As one would expect, dig(H) typically decreases whennew evidence arrives, but in general degree of igno-rance is non-monotone. This is due to the fact thatnew information may independently affect the twosets SP (H) and SP (Hc), as well as the normalizationconstant and therewith the posterior distribution.

In Jøsang’s subjective logic [12], a triple ω(H) :=(dsp(H), dsp(Hc), dig(H)) is called opinion, which isinterpreted as the representation of subjective beliefwith respect to H. This idea is in full accordance withthe basic concepts of probabilistic argumentation.

Independently of the actual degree of ignorance, wecan argue that the uncertainty with respect to a hy-pothesis H is maximal , whenever dsp(H) = dsp(Hc).This reflects a case in which H and Hc are equallysupported by the given knowledge and the prior dis-tribution. The two extreme cases of maximal uncer-tainty are dsp(H) = dsp(Hc) = 0 (maximal igno-rance) and dsp(H) = dsp(Hc) = 0.5 (minimal igno-rance). On the other hand, the uncertainty is minimalif H is known to be either true or false. Various levelsof (un-) certainty are depicted in Fig. 11. The caseof maximal uncertainty corresponds to the diagonalfrom the lower left corner to the center of the unitsquare. In general, the degree of uncertainty is

√2

times the distance d (marked by arrows) between thetwo parallels.

dsp(Hc)

dsp(H)

0 1

0

1! |= !

d

Figure 11: Various levels of uncertainty

To make this point more clear, consider four differentsituations of tossing a coin: a) the coin is known to befair; b) the coin is known to be faked with heads onboth sides; c) the coin is known to be faked with tailson both sides; d) the coin is unknown. In each case,we consider two possible hypotheses H = {head} andHc = {tail}. Corresponding probabilistic argumenta-tion systems will then produce the following results:

a) dsp({head}) = dsp({tail}) = 0.5;b) dsp({head}) = 1, dsp({tail}) = 0;c) dsp({head}) = 0, dsp({tail}) = 1;d) dsp({head}) = 0, dsp({tail}) = 0.

Note that these are the four extreme cases of proba-bilistic argumentation: the ignorance is minimal fora), b), and c), and maximal for d), whereas the un-certainty is minimal for b) and c) and maximal for a)and d).

This discussion demonstrates that uncertainty andignorance should be considered as distinct phe-nomenons. In fact, as illustrated by Fig. 10 andFig. 11, uncertainty and ignorance are orthogonalmeasures and reflect different aspects of our limitedknowledge about the world. This important observa-tion has been widely disregarded by people workingon probabilistic reasoning or Bayesianism.

What are the benefits of properly distinguishing be-tween uncertainty and ignorance using a sub-additivemeasure of support? First, by taking into accountthe possibility of lacking knowledge or missing data,it provides a more realistic and more complete pic-ture with regard to the hypothesis to be evaluatedin the light of the given knowledge. Second, if rea-soning deals as a preliminary step for decision mak-ing, a proper measure of ignorance is useful to decidewhether the available knowledge justifies an immedi-ate decision. The idea is that high degrees of igno-rance imply low confidence in the results due to lackof information. On the other hand, low degrees ofignorance result from situations where the availableknowledge forms a solid basis for a decision. There-fore, decision making should always consider the addi-tional option of postponing the decision until enoughinformation is gathered.9 The study of decision theo-ries with the option of further deliberation is a currentresearch topic in philosophy of economics [5, 13, 24].Apart from that, decision making under ignorance isa relatively unexplored discipline. For an overviewof attempts in the context of Dempster-Shafer theorywe refer to [20]. A more detailed discussion of sucha general decision theory is beyond the scope of thispaper.

6 Conclusion

This paper introduces the theory of probabilistic ar-gumentation as a general theory of reasoning underignorance. The key concept of the theory is the no-tion of degree of support, a sub-additive and non-monotone measure of uncertainty and ignorance. De-grees of support are (posterior) probabilities of prov-ability. Obviously, this includes the notion of prov-ability from logical reasoning, as well as the notionof probability from probabilistic reasoning. The twoclassical approaches to automated reasoning – logicaland probabilistic reasoning – are thus special cases

9 It’s like in real life: people do not like decisions under ig-norance. In other words, people prefer betting on events theyknow about. This psychological phenomenon is called ambigu-ity aversion and has been experimentally demonstrated by Ells-berg [6]. His observations are rephrased in Ellsberg’s paradox ,which is often used as an argument against decision-making onthe basis of subjective probabilities.

of probabilistic argumentation. The parameter thatmakes them distinct is the number of probabilisticvariables. Probabilistic argumentation is more gen-eral in the sense that it allows any number of proba-bilistic variables.

The significance and the consequences of this paperare manyfold. In the field of automated reasoning, weconsider probabilistic argumentation as a new founda-tion that unifies the existing approaches of logical andprobabilistic reasoning. This will have a great impacton the understanding and the numerous applicationsof automated reasoning within and beyond AI. Animportant basic requirement for this is a generalizeddecision theory that takes into account the possibil-ity of lacking information or missing data. Becauseprobabilistic argumentation generally clarifies the re-lationship between logic and probability theory, it willalso put some new light on topics from other areassuch as philosophy, mathematics, or statistics. In fact,promising preliminary work shows that looking at sta-tistical inference from the perspective of probabilis-tic argumentation helps to eliminate the discrepan-cies between the classical and the Bayesian approachto statistics.

Acknowledgements

Thanks to: (1) Michael Wachter for helpful remarksand proof-reading; (2) the anonymous reviewers fortheir constructive comments. This research is sup-ported by the Swiss National Science Foundation,Project No. PP002–102652.

References

[1] C. Camerer and M. Weber. Recent developmentsin modelling preferences: Uncertainty and ambi-guity. Journal of Risk and Uncertainty, 50:325–370, 1992.

[2] L. J. Cohen. The Probable and the Provable.Clarendon Press, Oxford, 1977.

[3] A. P. Dempster. A generalization of Bayesian in-ference. Journal of the Royal Statistical Society,30:205–247, 1968.

[4] D. Dequech. Fundamental uncertainty and am-biguity. Eastern Economic Journal, 26(1):41–60,2000.

[5] I. Douven. Decision theory and the rationality offurther deliberation. Economics and Philosophy,18(2):303–328, 2002.

[6] D. Ellsberg. Risk, ambiguity, and the Savage ax-ioms. Quarterly Journal of Economics, 75:643–669, 1961.

[7] R. Fagin and J. Y. Halpern. Uncertainty, be-lief, and probability. Computational Intelligence,7(3):160–173, 1991.

[8] H. Gaifman. A theory of higher order probabil-ities. In J. Y. Halpern, editor, TARK’86, 1stConference on Theoretical Aspects of Reasoningabout Knowledge, pages 275–292, Monterey, CA,1986. Morgan Kaufmann.

[9] R. Haenni. Unifying logical and probabilistic rea-soning. In ECSQARU’05, 8th European Confer-ence on Symbolic and Quantitative Approaches toReasoning under Uncertainty, Barcelona, Spain,2005.

[10] R. Haenni, J. Kohlas, and N. Lehmann. Proba-bilistic argumentation systems. In J. Kohlas andS. Moral, editors, Handbook of Defeasible Rea-soning and Uncertainty Management Systems,Volume 5: Algorithms for Uncertainty and De-feasible Reasoning, pages 221–288. Kluwer Aca-demic Publishers, 2000.

[11] R. Haenni and N. Lehmann. Probabilistic ar-gumentation systems: a new perspective onDempster-Shafer theory. International Jour-nal of Intelligent Systems (Special Issue: theDempster-Shafer Theory of Evidence), 18(1):93–106, 2003.

[12] A. Jøsang. A logic for uncertain probabili-ties. International Journal of Uncertainty, Fuzzi-ness and Knowledge-Based Systems, 9(3):279–311, 2001.

[13] D. Kelsey and J. Quiggin. Theories of choice un-der ignorance and uncertainty. Journal of Eco-nomic Surveys, 6(2):133–53, 1992.

[14] J. Kohlas, D. Berzati, and R. Haenni. Probabilis-tic argumentation systems and abduction. An-nals of Mathematics and Artificial Intelligence,34(1–3):177–195, 2002.

[15] J. Kohlas and P. A. Monney. A MathematicalTheory of Hints. An Approach to the Dempster-Shafer Theory of Evidence, volume 425 of LectureNotes in Economics and Mathematical Systems.Springer-Verlag, 1995.

[16] A. N. Kolmogorov. Foundations of the Theory ofProbability. Chelsea Publishing Company, NewYork, 1950.

[17] H. E. Kyburg. Higher order probabilities. InL. N. Kanal, T. S. Levitt, and J. F. Lemmer,editors, UAI’87, 3rd Conference on Uncertaintyin Artificial Intelligence, pages 15–22, 1987.

[18] H. E. Kyburg. Higher order probabilities andintervals. International Journal of ApproximateReasoning, 2:195–209, 1988.

[19] H. E. Kyburg. Interval-valued probabilities. InG. de Cooman and P. Walley, editors, The Impre-cise Probabilities Project. IPP Home Page, avail-able at http://ippserv.rug.ac.be, 1998.

[20] H. T. Nguyen and E. A. Walker. On deci-sion making using belief functions. In R. R.Yager, J. Kacprzyk, and M. Fedrizzi, editors,Advances in the Dempster-Shafer Theory of Evi-dence, pages 311–330. John Wiley and Sons, NewYork, 1994.

[21] J. Pearl. Probabilistic Reasoning in IntelligentSystems. Morgan Kaufmann, San Mateo, CA,1988.

[22] E. H. Ruspini. The logical foundations of eviden-tial reasoning. Technical Report 408, SRI Inter-national, AI Center, Menlo Park, USA, 1986.

[23] E. H. Ruspini, J. Lowrance, and T. Strat. Un-derstanding evidential reasoning. InternationalJournal of Approximate Reasoning, 6(3):401–424, 1992.

[24] D. Schmeidler. Subjective probability and ex-pected utility without additivity. Econometrica,57(3):571–587, 1989.

[25] G. Shafer. The Mathematical Theory of Evi-dence. Princeton University Press, 1976.

[26] Ph. Smets. Probability of provability and be-lief functions. Journal de la Logique et Analyse,133:177–195, 1991.

[27] Ph. Smets and R. Kennes. The transferable beliefmodel. Artificial Intelligence, 66:191–234, 1994.

[28] B. Tessem. Interval probability propagation. In-ternational Journal of Approximate Reasoning,7:95–120, 1992.

[29] P. Walley. Statistical Reasoning with ImpreciseProbabilities. Monographs on Statistics and Ap-plied Probability 42. Chapman and Hall, Lon-don, UK, 1991.

[30] P. Walley. Towards a unified theory of impre-cise probability. International Journal of Approx-imate Reasoning, 24(2–3):125–148, 2000.

[31] K. Weichselberger. The theory of interval-probability as a unifying concept for uncertainty.International Journal of Approximate Reasoning,24(2–3):149–170, 2000.

[32] R. R. Yager and V. Kreinovich. Decision makingunder interval probabilities. International Jour-nal of Approximate Reasoning, 22:195–215, 1999.

Documents

Towards a Unifying Theory of Logical and Probabilistic ... · Towards a Unifying Theory of Logical and Probabilistic Reasoning Rolf Haenni ... Abstract Logic and probability theory