22
Journal of Intelligent Information Systems, 9, 181–202 (1997) c 1997 Kluwer Academic Publishers. Manufactured in The Netherlands. An Extended Relational Data Model For Probabilistic Reasoning S.K.M. WONG [email protected] Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada S4S 0A2 Abstract. Probabilistic methods provide a formalism for reasoning about partial beliefs under conditions of uncertainty. This paper suggests a new representation of probabilistic knowledge. This representation encompasses the traditional relational database model. In particular, it is shown that probabilistic conditional independence is equivalent to the notion of generalized multivalued dependency. More importantly, a Markov network can be viewed as a generalized acyclic join dependency. This linkage between these two apparently different but closely related knowledge representations provides a foundation for developing a unified model for probabilistic reasoning and relational database systems. Keywords: Relational database, probabilistic reasoning, knowledge representation, generalized acyclic join de- pendency, belief networks 1. Introduction Probabilistic methods have been used extensively for plausible inference(Hajek et al., 1992, Kruse et al., 1988, Neapolitan, 1990, Pearl, 1988). Bayesian networks (Pearl, 1988) have become an important tool for probabilistic reasoning. In essence, a Bayesian network is a graphical representation of a joint probabilistic distribution. Probabilistic dependencies are used to simplify the representation of a joint distribution; they determine what probabilistic information is required to specify the network. Clearly, without these dependencies, espe- cially conditional independencies, it would be impractical to use probabilities for reasoning in intelligent systems. On the other hand, the relational model (Maier, 1983) has well been established as the basis for designing database management systems. It uses many types of constraints, called data dependencies, as a semantic tool for expressing properties of data. Data dependencies such as functional and multivalued dependencies play a crucial role in schema design. Many researchers have noticed similarities between the above two knowledge systems in- cluding conditional independence versus embedded multivalued dependency, probabilistic dependencies versus knowledge dependencies, and the use of DAGs to represent dependen- cies in both systems (Dechter, 1990, Lauritzen and Spiegelhalter, 1988, Pearl and Verma, 1987, Pearl, 1988). This is perhaps best summarized by Lee (Lee, 1983) and more re- cently by Hill (Hill, 1993). In their research Lauritzen and Spiegelhalter(Lauritzen and Spiegelhalter, 1988) discussed how to transform representations of a joint probability distri- bution. This paper, on the other hand, transforms the probabilistic model into an extended relational model by maintaining all conditional independencies in order to represent the joint probability distribution. In (Shafer et al., 1987), Shafer et al. pointed out qualitative conditional independence is equivalent to embedded multivalued dependency in relational

An Extended Relational Data Model For Probabilistic Reasoning

Embed Size (px)

Citation preview

Page 1: An Extended Relational Data Model For Probabilistic Reasoning

Journal of Intelligent Information Systems, 9, 181–202 (1997)c© 1997 Kluwer Academic Publishers. Manufactured in The Netherlands.

An Extended Relational Data Model ForProbabilistic Reasoning

S.K.M. WONG [email protected]

Department of Computer Science, University of Regina,Regina, Saskatchewan, Canada S4S 0A2

Abstract. Probabilistic methods provide a formalism for reasoning about partial beliefs under conditions ofuncertainty. This paper suggests a newrepresentation of probabilisticknowledge. This representation encompassesthe traditional relational database model. In particular, it is shown that probabilistic conditional independenceis equivalent to the notion of generalized multivalued dependency. More importantly, a Markov network can beviewed as a generalized acyclic join dependency. This linkage between these two apparently different but closelyrelated knowledge representations provides a foundation for developing a unified model for probabilistic reasoningand relational database systems.

Keywords: Relational database, probabilistic reasoning, knowledge representation, generalized acyclic join de-pendency, belief networks

1. Introduction

Probabilistic methods have been used extensively for plausible inference(Hajek et al., 1992,Kruse et al., 1988, Neapolitan, 1990, Pearl, 1988). Bayesian networks (Pearl, 1988) havebecome an important tool for probabilistic reasoning. In essence, a Bayesian network is agraphical representation of a joint probabilistic distribution. Probabilistic dependencies areused to simplify the representation of a joint distribution; they determine what probabilisticinformation is required to specify the network. Clearly, without these dependencies, espe-cially conditional independencies, it would be impractical to use probabilities for reasoningin intelligent systems.

On the other hand, the relational model (Maier, 1983) has well been established as thebasis for designing database management systems. It uses many types of constraints, calleddata dependencies, as a semantic tool for expressing properties of data. Data dependenciessuch as functional and multivalued dependencies play a crucial role in schema design.

Many researchers have noticed similarities between the above two knowledge systems in-cluding conditional independence versus embedded multivalued dependency, probabilisticdependencies versus knowledge dependencies, and the use of DAGs to represent dependen-cies in both systems (Dechter, 1990, Lauritzen and Spiegelhalter, 1988, Pearl and Verma,1987, Pearl, 1988). This is perhaps best summarized by Lee (Lee, 1983) and more re-cently by Hill (Hill, 1993). In their research Lauritzen and Spiegelhalter(Lauritzen andSpiegelhalter, 1988) discussed how to transform representations of a joint probability distri-bution. This paper, on the other hand, transforms the probabilistic model into an extendedrelational model by maintaining all conditional independencies in order to represent thejoint probability distribution. In (Shafer et al., 1987), Shafer et al. pointed out qualitativeconditional independence is equivalent to embedded multivalued dependency in relational

Page 2: An Extended Relational Data Model For Probabilistic Reasoning

182 WONG

theory. We would like to emphasize that qualitative conditional independence is not prob-abilistic conditional independence. Dechter (Dechter, 1990) also stated that conditionalindependence parallels the notion of embedded multivalued dependency. However, condi-tional independence as defined by Dechter is not probabilistic conditional independence.Shafer (Shafer et al., 1987) commented that the work in relational theory has not broughtout the analogy with probabilistic conditional independence. We believe this paper does ex-actly that. Whereas Pearl and Verma (Pearl and Verma, 1987) suggested that probabilisticconditional independence can be captured by the notion of embedded multivalued depen-dencies in relational databases, we will show that embedded multivalued dependency is anecessary but not a sufficient condition for probabilistic conditional independence.

This paper delves deeper than simply pointing out similarities between the two knowledgesystems by suggesting a new representation of probabilistic knowledge. This representa-tion encompasses the traditional relational database model. Implementation details ofthis extended relational data model, including query processing using a traditional querylanguage, are discussed in a separate paper (Wong et al., 1995). It should perhaps beemphasized that the data model we propose is different from the one suggested by Bar-bara et al. (Barbara et al., 1992). Our main objective here is to report that there existsan elegant and intriguing relationship between the above two important but rather distinctareas of research. In particular, we show that probabilistic conditional independence isequivalent to the notion of generalized multivalued dependency in an extended relationaldatabase model. Multivalued dependency is a special case of probabilistic conditionalindependence. It is well known that acyclic join dependency schema possesses many de-sirable properties (Beeri et al., 1983). By ignoring embedded conditional dependencies, aBayesian network can in fact be viewed as a generalized acyclic join dependency. Thelinkage established between these two knowledge representations provides a foundation fordeveloping a unified model for both probabilistic reasoning and relational database systems.

This paper is organized as follows. In Section 2, we briefly outline some database conceptspertinent to our discussion. In Section 3, we introduce an extended relational data model,in which we define two important types of data dependencies, generalized multivaluedand acyclic join dependencies. The relationship between probabilistic independencies andgeneralized data dependencies is explored in Section 4.

2. Database Concepts

LetN be a finite set of variables called attributes. We will use upper case lettersA, B,C, ...to denote a single attribute and ..., X, Y,Z to represent a subset of attributes. Each attributeA ∈ N takes on values from a finite domain VA. Consider a subset of attributes X ={A1, A2, ..., Am} ⊆ N . Let VX = VA1 ∪ VA2 ∪ ... ∪ VAm be the domain of X . A X-tupletX is a mapping from X to VX , i.e., tX : X −→ VX , with the restriction that for eachattributeA ∈ X , tX [A] must be in VA. (We write t instead of tX ifX is understood.) Thust is a mapping that associates a value in VX with each corresponding attribute in X . If Yis a subset of X and t a X-tuple, then t[Y ] denotes the Y-tuple obtained by restricting themapping to Y . Let y = t[Y ]. We call y a Y-value which is also referred to as a configurationof Y . A X-relation r (or a relation r over X, or simply a relation r if X is understood) is a

Page 3: An Extended Relational Data Model For Probabilistic Reasoning

PROBABILISTIC REASONING 183

finite set of X-tuples or X-values. If r is a X-relation and Y is a subset of X , then by r[Y ],the projection of relation r onto Y , we mean the set of tuples t[Y ], where t is in r.

We define a database scheme R = {R1, R2, ..., RN} to be a set of subsets of N . Wecall the Ri’s relation schemes. If r1, r2, ..., rN are relations, where ri is a relation overRi (1 ≤ i ≤ N), then we call r = {r1, r2, ..., rN} a database over R. The join (naturaljoin) of the relations in r (where the join is denoted by either r1 1 r2 1 ... 1 rN or 1 r)is the set of all tuples t with attributes R1 ∪R2 ∪ ... ∪RN , such that t[Ri] is in ri for eachi (1 ≤ i ≤ N). We say that a relation r with attributesR = R1 ∪R2 ∪ ... ∪RN obeys thejoin dependency 1{R1∪R2∪ ...∪RN} = 1R, if r = 1{r1, r2, ..., rN}, where ri = r[Ri],for 1 ≤ i ≤ N . It follows that the join dependency 1{R1 ∪ R2 ∪ ... ∪ RN} holds for arelation r if and only if r contains each tuple t for which there are tuples t1, t2, ..., tN of r(not necessarily distinct) such that ti[Ri] = t[Ri] for each i (1 ≤ i ≤ N).

Multivalued dependency (MVD) (Delobel, 1978, Fagin et al., 1982, Fagin, 1977) is a spe-cial case of join dependency (JD) (Beeri et al., 1983, Maier, 1983). We say that the MVDX →→ Y holds for relation r if for any t1, t2 ∈ r with t1[X ] = t2[X ], there exists a tuplet3 ∈ r such that t3[X ] = t1[X ], t3[Y ] = t1[Y ] and t3[R−XY ] = t2[R −XY ]. By XY,we mean X ∪ Y .

3. An Extended Relational Data Model

In the proposed relational data model, each relation ΦR represents a real-valued non-negative function φR on a set of attributes R = {A1, A2, ..., Am} as shown in Figure 1,where tij ∈ VAj , i.e., ti[Aj ] ∈ VAj is a configuration (tuple) of R. The function φR(ti),defines the values of the attribute fφR in relation ΦR. The semantic interpretation of thefunction φR would depend very much on the particular application.

In the conventional database model, for example,φ(t) could be interpreted as the numberof tuples t in a relation, if one is interested in keeping track of duplicate tuples resultingfrom a projection. Let ΦU denote a universal relation with φ(t) = 1 for all tuples ofU = A1, A2, . . . , An. The relation ΦR shown in Figure 1 can be interpreted as the relationobtained by projecting ΦU ontoR = A1A2...Am ⊆ U , where φ(ti) is the number of tupleswith ti = t[R] in the original relation. Clearly, it is not necessary to use such a function φto define a relation, if counting of duplicate tuples is not an issue. We will show that theconventional relational database model is indeed a special case of the extended data modelintroduced here.

On the other hand, in a probabilistic model, for example, the relationΦR shown in Figure 1represents a joint probabilistic distribution. That is, the function φR(t) onR, which definesthe values of the attribute fφR in relation ΦR, is a joint distribution (a marginal distribution).

We can define an inverse relation (ΦR)−1 for ΦR by setting φ−1R (ti) = 1/φR(ti) if

φR(ti) 6= 0 and φ−1R (ti) = φR(ti) otherwise. The inverse relation (ΦR)−1 of the relation

ΦR is shown in Figure 2.The reason for introducing such an inverse relation will become clear when a specificapplication is considered.

Apart from the select, project and natural join operators in a standard relational system,we define here two new relational operators called marginalization and product join.

1. Marginalization

Page 4: An Extended Relational Data Model For Probabilistic Reasoning

184 WONG

A1 A2 ...... Am fφRt11 t12 ...... t1m φR(t1)t21 t22 ...... t2m φR(t2)

ΦR = . . . . .. . . . .. . . . .ts1 ts2 ...... tsm φR(ts)

Figure 1. The relation ΦR representing a function φR on R = {A1, A2, ..., Am}.

A1 A2 ...... Am fφ−1R

t11 t12 ...... t1m φ−1R (t1)

t21 t22 ...... t2m φ−1R (t2)

(ΦR)−1 = . . . . .. . . . .. . . . .ts1 ts2 ...... tsm φ−1

R (ts)

Figure 2. The inverse relation (ΦR)−1 of ΦR.

Let X be a subset of attributes of R. The operator of marginalization is denoted by thesymbol ↓. The marginal φ↓XR of ΦR is a relation on X ∪{fφR}. We can construct Φ↓XRfrom ΦR as follows:

(a) Project the relation ΦR on the set of attributes X ∪ {fφR}, without eliminatingidentical configurations (tuples).

(b) Let t be a tuple in ΦR[R]. For every configuration tX = t[X], replace the set ofconfigurations of X ∪{fφR} in the relation obtained from step (a) by the singletonconfiguration:

tX ∗ (∑tR−X

φR(tX ∗ tR−X)) ,

where tR−X = t[R−X ] and t = tX ∗tR−X . The symbol ∗ denotes concatenationof two tuples.

Consider, for example, the relation ΦX defined by a function φX in Figure 3 withX = {A1, A2, A3} and VA1 = VA2 = VA3 = {0, 1}.

Suppose we want to compute the marginal Φ↓A1A2X . From step (a), we obtain the table

shown in Figure 4 by projecting ΦX on {A1, A2, fφX} without eliminating identicaltuples.

Page 5: An Extended Relational Data Model For Probabilistic Reasoning

PROBABILISTIC REASONING 185

A1 A2 A3 fφX0 0 0 φX (0 , 0, 0) = d1

0 0 1 φX (0 , 0, 1) = d2

0 1 0 φX (0 , 1, 0) = d3

ΦX = 0 1 1 φX (0 , 1, 1) = d3

1 0 0 φX (1 , 0, 0) = d4

1 0 1 φX (1 , 0, 1) = d4

1 1 0 φX (1 , 1, 0) = d5

1 1 1 φX (1 , 1, 1) = d6

Figure 3. A relation ΦX with X = {A1, A2, A3} defined by a function φX on X .

A1 A2 fφX0 0 d1

0 0 d2

0 1 d3

ΦX [{A1, A2, fφX}] = 0 1 d3

1 0 d4

1 0 d4

1 1 d5

1 1 d6

Figure 4. The “projection” of ΦX on {A1, A2, fφX }.

A1 A2 fφ↓A1A2X

0 0 d1 + d2

Φ↓A1A2X = 0 1 d3 + d3

1 0 d4 + d4

1 1 d5 + d6

Figure 5. The marginal relation Φ↓A1A2X

of ΦX .

The resultant relation, the marginal Φ↓A1A2X of ΦX , obtained from step (b), is shown in

Figure 5.

2. Product Join

Consider two relations ΦX ,ΨY defined respectively by functions φX and ψY . Theproduct join of ΦX and ΨY , written ΦX ×ΨY , is defined as follows:

Page 6: An Extended Relational Data Model For Probabilistic Reasoning
Page 7: An Extended Relational Data Model For Probabilistic Reasoning

PROBABILISTIC REASONING 187

3.1. Generalized multivalued dependency

We now introduce the key notion of generalized multivalued dependency (GMVD) in ourrelational model. Consider a relation ΦR over the set of attributes R ∪ {fφR} as shown inFigure 1. Let X and Y be disjoint subsets of R = {A1, A2, ..., Am} and Z = R −XY .Relation ΦR satisfies the GMVD X −◦→ Y , if ΦR decomposes losslessly into relationsΦ↓XYR and Φ↓XZR , namely:

ΦR = Φ↓XYR ⊗Φ↓XZ

R , (1)

where the monotone join operator ⊗ is defined as: for any V , W ⊆ R,

Φ↓VR ⊗Φ↓WR = Φ↓VR × Φ↓WR × (Φ↓V ∩WR )−1. (2)

In the above definition, × is the product join operator and (Φ↓V∩WR )−1 denotes the inverserelation of Φ↓V ∩WR .

The following demonstrates that GMVD is a generalization of multivalued dependencyin a conventional relational database model.

The notion of multivalued dependency can be understood in terms of sorting and counting.First, let us define a function nW [X = x] on the relation ΦR = ΦR[R] to count the numberof distinct W-values associated with a given X- value in the relation:

nW [X = x](ΦR) = | {t[W ] | t ∈ ΦR, t[X ] = x} |, (3)

where | · | denotes the cardinality of a set and X, W are arbitrary subsets of R. It can beverified that relation ΦR satisfies the MVD X →→ Y , if and only if for any X-value x inΦR,

nR[X = x](ΦR) = nXY [X = x](ΦR) · nXZ [X = x](ΦR), (4)

where Z = R−XY . Since nXW [X = x] = nW [X = x], we can simplify condition (4)to: for any X-value x in ΦR,

nR[X = x](ΦR) = nY [X = x](ΦR) · nZ [X = x](ΦR). (5)

In fact, multivalued dependency can be equivalently defined as follows (Maier, 1983): ΦR

satisfies the MVD X →→ Y , if for every X-value x and Y-value y in ΦR such thatxy = t[XY ] and t ∈ ΦR,

nZ[X = x](ΦR) = nZ[XY = xy ](ΦR)= | {t[XY ] | t ∈ ΦR, t[X] = x and t[Y ] = y} |, (6)

where XY denotes the set union X ∪ Y . Likewise, nY [X = x](ΦR) = nY [XZ =xz](ΦR), where xz = t[XZ] and t ∈ ΦR.

The above results about MVDs are summarized by the following lemma.

Lemma 1 Let ΦR = ΦR[R] be a relation on R, and let X and Y be disjoint subsets ofR and Z = R −XY . Relation ΦR satisfies the MVD X →→ Y , if and only if for everyX-value x, Y-value y, and Z-value z in ΦR, such that the XY Z-value xyz appears in ΦR:

Page 8: An Extended Relational Data Model For Probabilistic Reasoning

188 WONG

A1 A2 ...... Am fφRt11 t12 ...... t1m φR(t1) = ct21 t22 ...... t2m φR(t2) = c

ΦR = . . . . .. . . . .. . . . .ts1 ts2 ...... tsm φR(ts) = c

Figure 7. A constant relation ΦR .

nR[X = x](ΦR) = nY [XZ = xz](ΦR) · nZ[XY = xy ](ΦR).

Consider a constant relation ΦR defined by a constant function φR, i.e., for any configu-ration t ∈ ΦR = ΦR[R], φR(t) is equal to a constant c as shown in Figure 7.For example, in the probabilistic model, a constant relation ΦR represents a uniform distri-bution. In the standard relational database model, we may set c = 1 in all relations if weare not interested in counting duplicate tuples. In general, however, c can be any positivereal number in the extended data model.

Theorem 1 Let ΦR be a constant relation over R ∪ {fφR}, and let X and Y be dis-joint subsets of R and Z = R − XY . Relation ΦR satisfies the generalized multivalueddependency (GMVD) X −◦→ Y , i.e., ΦR = Φ↓XY

R ⊗ Φ↓XZR , if and only if the relationΦR = ΦR[R] satisfies the multivalued dependency (MVD) X →→ Y .

Proof: Let ΦR be defined by the function φR, and ΦR = ΦR[R]. By definition ofmarginalization, the marginal Φ↓XYR is a relation over the set of attributes XY ∪ {fφXY }defined by the function φR1 on R1 = XY : for any configuration tXY = xy = t[R1] andt ∈ ΦR,

φR1 (t[R1]) = φXY (xy) =∑tZ

φR(tXY ∗ tZ) =∑z

φR(xyz), (7)

where z = tZ = t[Z] and ∗ denotes concatenation of tuples.Since φR is a constant function, i.e., φR(t) = c for any configuration t ∈ ΦR, it follows:

φXY (xy) =∑z

φR(xyz) = cnZ [XY = xy ](ΦR), (8)

where nZ [XY = xy ](ΦR) is the number of distinct Z-values for a given XY-value xy inΦR. Similarly, the marginal Φ↓XZ

R is a relation overXZ ∪ {fφXZ} defined by the functionφR2 on R2 = XZ : for any configuration tXZ = xz = t[R2] and t ∈ ΦR,

φR2 (t[R2]) = φXZ(xz) =∑tY

φR(tX ∗ tY ∗ tZ) =∑y

φR(xyz), (9)

Page 9: An Extended Relational Data Model For Probabilistic Reasoning

PROBABILISTIC REASONING 189

where y = tY = t[Y ]. We obtain:

φXZ(xz) =∑y

φR(xyz) = cnY [XZ = xz](ΦR), (10)

where nY [XZ = xz](ΦR) is the number of distinct Y-values for a given XZ-value xz inΦR.

By definition, the relation Φ↓XYR ⊗ Φ↓XZ

R is defined by the following function ρR onR = XY Z: for any configuration xyz ∈ ΦR[XY ] 1 ΦR[XZ ],

ρR(xyz) =φXY (xy) · φXZ(xz)

φX(x). (11)

The function φR1∩R2 = φX on X is defined by:

φR1∩R2(t[X ]) = φX (x) =∑tY Z

φR(tX ∗ tY Z)

=∑yz

φR(xyz)

= cnR[X = x](ΦR), (12)

where nR[X = x](ΦR) is the number of distinct tuples for a given X-value x in ΦR. Thus,from equations (8), (10), (11), and (12), the function ρR(xyz) which defines the monotonejoin can be expressed as: for any configuration xyz ∈ ΦR[XY ] 1 ΦR[XZ ],

ρR(xyz) =cnZ[XY = xy ](ΦR) · cnY [XZ = xz](ΦR)

cnR[X = x](ΦR). (13)

On the other hand, the relationΦR is defined by the constant functionφR onR, i.e., for anyconfiguration xyz = t[R] ∈ ΦR, φR(xyz) = c. Thus, the condition ΦR = Φ↓XYR ⊗Φ↓XZ

R

is satisfied, if and only if ρR = φR. It is not difficult to see that the condition ρR = φR isequivalent to:

(i) ΦR = ΦR[R] = ΦR[XY ] 1 ΦR[XZ],

(ii) For any X-value x,Y-value y,and Z-value z such that the XY Z-value xyz appears inΦR,

nR[X = x](ΦR) = nY [XZ = xz](ΦR) · nZ [XY = xy](ΦR).

It should be noted that conditions (i) and (ii) are in fact not independent. We can immedi-ately conclude from Lemma 1 that the relation ΦR = ΦR[R] satisfies the MVD X →→ Y

if and only if condition (ii) holds, that is, if and only if ΦR = Φ↓XYR ⊗ Φ↓XZR .

Theorem 1 clearly indicates that, in a constant distribution, ΦR = Φ↓XYR ⊗Φ↓XZR if andonly if the relation ΦR[R] satisfies the MVD X →→ Y . In an arbitrary distribution, how-ever, ΦR = Φ↓XY

R ⊗ Φ↓XZR implies that the relation ΦR[R] satisfies the MVD X →→ Y ,

Page 10: An Extended Relational Data Model For Probabilistic Reasoning
Page 11: An Extended Relational Data Model For Probabilistic Reasoning

PROBABILISTIC REASONING 191

That is, the multivalued dependency A2 →→ A1 (or A2 →→ A3) holds in the relationΦA1A2A3 .

This example clearly demonstrates that in general if ΦR = Φ↓XYR ⊗Φ↓XZR , thenX →→ Y

holds in ΦR = ΦR[R], but the converse is not necessarily true.Consider the constant relation ΨA1A2A3 illustrated in Figure 12.

It can be easily verified that in this special case, ΨA1A2A3 = Ψ↓A1A2A1A2A3

⊗ Ψ↓A2A3A1A2A3

holdsif and only if ΨA1A2A3 [A1A2A3] = ΨA1A2A3 [A1A2] 1 ΨA1A2A3 [A2A3] holds.

3.2. Generalized acyclic join dependency

First, let us introduce some notions of graph theory pertinent to our discussion. A hypergraphis a pair (N ,S), where N is a finite set of nodes (attributes) and S is a set of edges(hyperedges) which are arbitrary subsets of N (Berge, 1973, Shafer, 1991). If the nodesare understood, we will use S to denote the hypergraph (N ,S). An ordinary undirectedgraph (without self-loops) is, of course, a hypergraph whose every edge is of size two.We say an element Si in a hypergraph S is a twig if there exists another element Sj inS, distinct from Si, such that (∪(S − {Si})) ∩ Si = Si ∩ Sj . We call any such Sj a

Φ↓A1A2A1A2A3

⊗ Φ↓A2A3A1A2A3

= Φ↓A1A2A1A2A3

×Φ↓A2A3A1A2A3

× (Φ↓A2A1A2A3

)−1

A1 A2 fφA1A2A2 A3 fφA2A3

A2 f−1φA2

= 0 0 1c × 0 1 1c × 0 1/c0 1 5c 1 0 6c 1 1/15c1 1 10c 1 1 9c

A1 A2 A3 fφA1A2 ·φA2A3

0 0 0 1c2 A2 fφ−1A2

= 0 1 0 30c2 × 0 1/c0 1 1 45c2 1 1/15c1 1 0 60c2

1 1 1 90c2

A1 A2 A3 fφA1A2 ·φA2A3 ·φ−1A2

0 0 0 1c= 0 1 0 2c = ΦA1A2A3 .

0 1 1 3c1 1 0 4c1 1 1 6c

Figure 10. ΦA1A2A3 decompses loosely into relations Φ↓A1A2A1A2A3

and Φ↓A2A3A1A2A3

.

Page 12: An Extended Relational Data Model For Probabilistic Reasoning

192 WONG

A1 A2 A3

0 0 0 A1 A2 A2 A3

0 1 0 = 0 0 1 0 00 1 1 0 1 1 01 1 0 1 1 1 11 1 1

Figure 11. ΦA1A2A3 = ΦA1A2A3 [A1A2A3] = ΦA1A2A3 [A1A2] 1 ΦA1A2A3 [A2A3].

A1 A2 A3 fφA1A2A3

0 0 0 cΨA1A2A3 = 0 1 0 c

0 1 1 c1 1 0 c1 1 1 c

Figure 12. The constant relation ΨA1A2A3 .

branch for the twig Si. A hypergraph S is a hypertree (Jensen, 1988, Shafer, 1991) if itselements can be ordered, say S1, S2, ..., SN , so that Si is a twig in {S1, S2, ..., Si}, fori = 2, ..., N . We call any such ordering a tree (hypertree) construction ordering for S.Given a tree construction ordering S1, S2, ..., SN , we can choose, for i from 2 to N , aninteger j(i) such that 1 ≤ j(i) ≤ i− 1 and Sj(i) is a branch for Si in {S1, S2, ..., Si}. Wecall a function j(i) that satisfies this condition a branching for S and S1, S2, ..., SN . Forexample, let N = {A1, A2, ..., A6}. Consider a hypergraph S = {S1={A1, A2, A3}, S2={A1, A2, A4}, S3={A2, A3, A5}, S4={A5, A6}}. This hypergraph is a hypertree, as thereexists a tree construction ordering, S3, S1, S2, S4. Furthermore, the branching function forthis ordering is j(1) = 3, j(2) = 1, j(4) = 3.

Given a tree construction ordering S1, S2, ..., SN for a hypertree S and a branchingfunction j(i) for this ordering, we can construct the following set of subsets: L = {Sj(2) ∩S2, Sj(3) ∩S3, ..., Sj(N) ∩SN}. It is important to note that this set L is independent of thetree construction ordering, i.e., L is the same for any tree construction ordering of a givenhypertree. We call L the interaction set of the hyperedges in S.

Let Φi denote a relation over relation scheme Si = Ri ∪ {fφRi}, in which a real-valuedfunction φi on Ri defines the values of the attribute fφi in relation Φi. We call a set ofrelationsΦ = {Φ1,Φ2, ...,ΦN} a database over the database schemeS = {S1, S2, ..., SN}.Note that by construction, each attribute fφRi(1 ≤ i ≤ N) is unique and therefore appearsonly in one relation scheme.

The hypergraph of a database scheme S = {S1, S2, ..., SN} has as its set of nodes,those attributes that appear in one or more of the Si’s, and as its set of edges S. A

Page 13: An Extended Relational Data Model For Probabilistic Reasoning
Page 14: An Extended Relational Data Model For Probabilistic Reasoning
Page 15: An Extended Relational Data Model For Probabilistic Reasoning

PROBABILISTIC REASONING 195

in consolidating the information relevant to decision making. It would be impracticalto represent probabilistic knowledge by a joint distribution without using any conditionalindependencies, because such a representation would require an exponentially large numberof variable combinations.

In this section, we will first show that probabilistic conditional independence is equivalentto the notion of generalized multivalued dependency (GMVD) introduced earlier in ourrelational data model. We will then demonstrate that with conditional independencies, ajoint probability distribution can be represented as a generalized acyclic join dependency(GAJD).

4.1. Conditional independence

Given a joint probability distribution φR on a set of attributes (variables) R, we can con-struct another function called a marginal distribution on a subset X of R. Let t denotea configuration of R and let tX = t[X] be a configuration of X ⊆ R. The marginaldistribution φX of φR is a function on X defined by: for any configuration tX ,

φX (tX ) =∑tR−X

φR(tX ∗ tR−X ), (14)

where tR−X = t[R−X ] and t = tX ∗ tR−X . The symbol ∗ denotes concatenation.LetX,Y , andZ be disjoint subsets ofR = XY Z . We say that Y andZ are conditionally

independent given X , if for any XYZ-value xyz,

φXYZ(xyz)φXZ(xz)

=φXY (xy)φX (x)

, (15)

where φXZ(xz) > 0 and φX (x) > 0. For convenience, we write the marginal distributionφW (t[W ]) asφ(t[W ]) andφ(x, y, z, ...) asφ(xyz...), if no confusion arises. Thus, condition(15) can be expressed as:

φ(y|xz) =φ(xyz)φ(xz)

=φ(xy)φ(x)

= φ(y|x). (16)

Alternatively, conditional independence can be defined by:

φ(yz|x) = φ(y|x) · φ(z|x) (17)

or

φ(xyz) =φ(xy) · φ(xz)

φ(x). (18)

In the proposed data model, a joint probability distribution φR on a set of variables Rcan be represented as a relation ΦR (see Figure 1). By the definition of marginalizationgiven in Section 3, the relation Φ↓XR over X ⊆ R represents the marginal distribution φXof the joint distribution φR, as defined by equation (14). Thus, it follows immediately that

Page 16: An Extended Relational Data Model For Probabilistic Reasoning

196 WONG

A1 A2 A3 fφR0 0 0 φR(0 , 0, 0) = 1/16

ΦR = ΦA1A2A3= 0 1 0 φR(0 , 1, 0) = 2/160 1 1 φR(0 , 1, 1) = 3/161 1 0 φR(1 , 1, 0) = 4/161 1 1 φR(1 , 1, 1) = 6/16

Figure 17. The joint distributionφR represented as the relation ΦR.

the independence condition (18) can be equivalently stated as a generalized multivalueddependency, namely:

ΦR = Φ↓XYR × Φ↓XZR × (Φ↓XR )−1

= Φ↓XYR ⊗ Φ↓XZR , (19)

where relations Φ↓XYR ,Φ↓XZR , and Φ↓XR represent the marginal distributions φXY , φXZ ,and φX , respectively, and (Φ↓XR )−1 is the inverse relation of Φ↓XR .

Let us summarize the above results by the following theorem.

Theorem 2 Let φR be a joint probability distribution on a set of attributes R, and letX, Y , and Z be disjoint subsets of R = XY Z. The subsets Y and Z are conditionallyindependent given X, if and only if the relation ΦR defined by φR satisfies the generalizedmultivalued dependency X −◦→ Y (or X −◦→ Z).

The following example demonstrates the notion of generalized multivalued dependency.

Example: Consider the joint probabilistic distribution φR(a1, a2, a3) defined on a set ofrandom variables R = {A1, A2A3}:

φR(0, 0, 0) = 1/16,φR(0, 1, 0) = 2/16,φR(0, 1, 1) = 3/16,φR(1, 1, 0) = 4/16,φR(1, 1, 1) = 6/16.

This joint distribution φR can be conveniently represented by a relation ΦR in the extendedrelational data model defined as shown in Figure 17.Note that the above relation is obtained by substituting c = 1/16 into the relation defined inExample 1. One can easily verify thatA1 andA3 are probabilistic conditionally independentgiven A2, namely:

Page 17: An Extended Relational Data Model For Probabilistic Reasoning
Page 18: An Extended Relational Data Model For Probabilistic Reasoning
Page 19: An Extended Relational Data Model For Probabilistic Reasoning

PROBABILISTIC REASONING 199

Berge, 1973), namely, R(G) is a hypertree. Let L denote the interaction set of the hyper-edges in R(G) as defined in Section 3.2.

Lemma 2 (Hajek et al., 1992) If a joint probability distribution φ is decomposable rela-tive to a chordal undirected graph G, then φ can be written as a product of the marginaldistributions of the maximal cliques of G divided by a product of the marginal distributionsof the interaction set of R(G).

We call a distribution defined by Lemma 2 a Markov distribution. (Note that this definitionis different from the one given in (Pearl, 1988).) It should perhaps be noted that the com-putation of marginal distributions is a major problem in practical applications of Bayesiannetworks as it may easily become intractable (Cooper, 1990). Fortunately, many efficientalgorithms based on the techniques of local propagation (Jensen, 1988, Shafer, 1991) havebeen developed for computing the marginals of a factorized joint probability distribution.

Suppose a joint distribution φR is decomposable relative to a chordal graph G. Let R ={R1, R2, ..., RN} denote the set of hyperedges of the hypertree R(G). Let R = R1 ∪R2 ∪ ... ∪RN . Each hyperedge Ri in R(G) defines a marginal distribution φi of φR. LetL = {Rj(2) ∩R2, Rj(3) ∩R3, ..., Rj(N) ∩RN} be the intersection set of R(G), in whichwe have tacitly assumed that the sequence R1, R2, ..., RN is a tree construction orderingfor R(G). The joint probability distribution φR can be represented as a relation ΦR overthe set of attributes S = R ∪ {fφR}, where the values of the attribute fφR are defined bythe function φR. Similarly, each marginal distribution φi (1 ≤ i ≤ N) is represented by arelation Φ↓RiR over Si = Ri ∪ {fφi}, where the values of the attribute fφi are defined bythe function φi.

By Lemma 2 and the definition of product join, the relation ΦR over S can be expressedas:

ΦR = Φ↓R1R × Φ↓R2

R × ... ×Φ↓RNR

×(Φ↓Rj(2)∩R2

R )−1 × (Φ↓Rj(3)∩R3

R )−1 × ... × (Φ↓Rj(N)∩RNR )−1. (25)

Since the sequence R1, R2, ..., RN is a tree construction ordering for R(G), we have for1 ≤ j(i) ≤ i− 1 and i = 2, 3, ..., N :

(R1R2...R i−1) ∩Ri = Rj(i) ∩Ri, (26)

where Rj(i) is a branch of the twig Ri in the hypertree. Thus equation (25) can be writtenas a sequential monotone join expression as defined in Section 3.2:

ΦR = (....((Φ↓R1R ⊗ Φ↓R2

R )⊗ Φ↓R3R )....⊗ Φ↓RNR )

= θ(Φ). (27)

This means that the relation ΦR satisfies the generalized acyclic join dependency⊗{S1, S2, ..., SN} = ⊗S, where Φ = {Φ↓R1

R ,Φ↓R2R , ...,Φ↓RNR } is a database over the

database scheme S = {S1, S2, ..., SN} = {R1 ∪ {fφ1}, R2 ∪ {fφ2}, ..., RN ∪ {fφN }}.

Page 20: An Extended Relational Data Model For Probabilistic Reasoning

200 WONG

A B C D E G fψRΨR = ΨABCDEG = 0 0 0 0 1 1 ψR(0 , 0, 0, 0, 1, 1) = 1/6

1 0 0 1 1 0 ψR(1 , 0, 0, 1, 1, 0) = 2/60 1 1 1 0 0 ψR(0 , 1, 1, 1, 1, 0) = 3/6

Figure 21. The joint distribution represented as a relation ΨR.

Theorem 3 A decomposable joint probability distribution is equivalent to a generalizedacyclic join dependency.

The implication of this theorem can perhaps be best realized with the following example.

Example: Consider the following joint probability distributionψR(a, b, c, d, e, g)definedon a set of random variables R = {A,B,C,D, E,G}:

ψR(0 , 0, 0, 0, 1, 1) = 1/6,ψR(1 , 0, 0, 1, 1, 0) = 2/6,ψR(0 , 1, 1, 1, 0, 0) = 3/6.

This joint distribution can be represented by a relation ΨR in the extended relational datamodel defined as illustrated in Figure 21.Note that the above relation is obtained by substituting c = 1/6 in the relation defined inExample 2. One can easily verify that the joint distributionψR can be factorized as follows:

ψR = ψABCDEG =ψABC · ψABD · ψBCE · ψACG

ψAB · ψBC · ψAC.

By definition ψR is decomposable.The marginal distributions ψABC , ψABD,ψBCE , ψACG,ψAB , ψBC , and ψAC of ψR re-

spectively define the following marginal relations Ψ↓ABCR ,Ψ↓ABDR , Ψ↓BCER ,Ψ↓ACGR ,(Ψ↓AB

R )−1, (Ψ↓BCR )−1, and (Ψ↓AC

R )−1 of ΨR depicted in Figure 22.Based on the results obtained in Example 2, it immediately follows that the above de-

composable joint probability distribution ΨABCDEG can be expressed as:

ΨR = ΨABCDEG = (((Ψ↓ABCR ⊗Ψ↓ABDR )⊗ Ψ↓BCER )⊗ Ψ↓ACGR ).

That is, this example clearly shows that a decomposable distribution is indeed equivalentto a generalized acyclic join dependency in the extended relational data model.

Page 21: An Extended Relational Data Model For Probabilistic Reasoning
Page 22: An Extended Relational Data Model For Probabilistic Reasoning

202 WONG

R. Dechter, “Decomposing a Relation into a Tree of Binary Relations,” Journal of Computer and System Sciences,vol. 41, pp. 2-24, 1990.

C. Delobel, “Normalization and hierarchical dependencies in the relational data model,” ACM Transactions onDatabase Systems, vol. 3, no. 3, pp. 201-222, 1978.

R. Fagin, A.O. Mendelzon and J.D. Ullman, “A simplified universal relation assumption and its properties,” ACMTransactions on Database Systems, vol. 7, no. 3, pp. 343-360, 1982.

R. Fagin, “Multivalued dependencies and a new normal form for relational databases,” ACM Transactions onDatabase Systems, vol. 2, no. 3, pp. 262-278, 1977.

P. Hajek, T. Havranek and R. Jirousek, Uncertain Information Processing in Expert Systems. CRC Press, 1992.J. Hill, “Comment,” Statistical Science, vol. 8, no. 3, pp. 258-261, 1993.F.V. Jensen, “Junction trees – a new characterization of decomposable hypergraphs,” Research Report, JUDEX,

Aalborg, Denmark, 1988.R. Kruse, E. Schwecke and J. Heinsohn, Uncertainty and Vagueness in Knowledge Based Systems. Springer-

Verlag, 1988.S. Lauritzen and D. Spiegelhalter, “Local Computations with Probabilities on Graphical Structures and their

Application to Expert Systems,” Journal of the Royal Statistical Society, B, vol. 50, No. 2, pp. 157-224, 1988.T.T. Lee, “An Algebraic Theory of Relational Databases,” The Bell System Technical Journal, vol. 62, no. 10,

pp. 3159-3204, 1983.T.T. Lee, “An information-theoretic analysis of relational databases-Part I: data dependencies and information

metric,” IEEE Transactions on Software Engineering, vol. SE-13, no. 10, pp. 1049-1061, 1987.D. Maier, The Theory of Relational Databases. Computer Science Press, 1983.R.E. Neapolitan, Probabilistic Reasoning in Expert Systems. John Wiley & sons, Inc., 1990.J. Pearl, “Fusion, propagation and structuring in belief networks,” Artificial Intelligence, vol. 29, no. 3, pp. 241-288,

1986.J. Pearl and T. Verma, “The Logic of Representing Dependencies by Directed Graphs,” AAAI87 Sixth National

Conference on Artificial Intelligence, vol. 1, pp. 374-379, 1987.J. Pearl, Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, 1988.G. Shafer, P. Shenoy and K. Mellouli, “Propagating Belief Functions in Qualitative Markov Trees,” International

Journal of Approximate Reasoning, vol. 1, pp. 349-400, 1987.G. Shafer, “An axiomatic study of computation in hypertrees,” School of Business Working Paper Series, (No.

232), University of Kansas, Lawrence, 1991.S.K.M. Wong, C.J. Butz and Y. Xiang, “A method for implementing a probabilistic model as a relational database,”

Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 556-564, 1995.S.K.M. Wong, “Testing implication of probabilistic dependencies,” Twelfth Conference on Uncertainty in Artificial

Intelligence, pp. 545-553, 1996.