8

Click here to load reader

Network Fragments: Representing Knowledge for ...seor.vse.gmu.edu/~klaskey/papers/Fragments.pdfNetwork Fragments: Representing Knowledge for Constructing Probabilistic Models Kathryn

Embed Size (px)

Citation preview

Page 1: Network Fragments: Representing Knowledge for ...seor.vse.gmu.edu/~klaskey/papers/Fragments.pdfNetwork Fragments: Representing Knowledge for Constructing Probabilistic Models Kathryn

Network Fragments:Representing Knowledge for Constructing Probabilistic Models

Kathryn Blackmond Laskey Suzanne M. MahoneyDepartment of Systems Engineering and C3I Center Information Extraction and Transport, Inc.

George Mason University 1730 N. Lynn Street, Suite 502Fairfax, VA 22030 Arlington, VA [email protected] [email protected]

AbstractIn most current applications of belief networks,domain knowledge is represented by a singlebelief network that applies to all probleminstances in the domain. In more complexdomains, problem-specific models must beconstructed from a knowledge base encodingprobabilistic relationships in the domain. Mostwork in knowledge-based model constructiontakes the rule as the basic unit of knowledge.We present a knowledge representationframework that permits the knowledge basedesigner to specify knowledge in largersemantically meaningful units which we callnetwork fragments. Our framework provides forrepresentation of asymmetric independence andcanonical intercausal interaction. We discuss thecombination of network fragments to formproblem-specific models to reason aboutparticular problem instances. The framework isillustrated using examples from the domain ofmilitary situation awareness.

1 INTRODUCTIONThe vast majority of published applications of beliefnetworks consist of template models. A template model isappropriate for problem domains in which the relevantvariables, their state spaces, and their probabilisticrelationships do not vary from problem instance toproblem instance. Thus, generic knowledge about thedomain can be represented by a fixed belief network overa fixed set of variables, obtained by some combination ofexpert judgment and learning from observation. Problemsolving for a particular case is performed by conditioningthe network on case-specific evidence and computing theposterior distributions of variables of interest. Forexample, a medical diagnosis template network wouldcontain variables representing background informationabout a patient, possible medical conditions the patientmight be experiencing, and clinical findings that might beobserved. The network encodes probabilisticrelationships among these variables. To performdiagnosis on a particular patient, background information

and findings for the patient are entered as evidence andthe posterior probabilities of the possible medicalconditions are reported. Although values of the evidencevariables vary from patient to patient, the relevantvariables and their probabilistic relationships are assumedto be the same for all patients. It is this assumption thatjustifies the use of template models.

The development of efficient belief propagationalgorithms for template models enabled an explosion ofresearch and applications of probability models inintelligent systems (e.g., Pearl 1988; Jensen, 1996). Asbelief network technology is applied to more complexproblems, the limitations of template models becomeclear. Even when a domain can be represented by atemplate model, its size and complexity may make itnecessary to represent it implicitly as a collection ofmodular subunits from which smaller submodels areconstructed for reasoning about problem instances(Pradhan et al, 1994). In more complex domains templatemodels are insufficient as a knowledge representationbecause the relevant variables and their interrelationshipsvary from problem instance to problem instance. In suchdomains, belief networks can still be used to capturestable patterns of probabilistic relationships for pieces ofthe domain, and these pieces brought together to buildprobability models to reason about particular probleminstances (Wellman, Breese and Goldman, 1992;Goldman and Charniak, 1993). There has been a steadyinterest in automated construction of belief networkmodels in fields such as natural language understanding(Goldman and Charniak, 1993), military situationassessment (Laskey et al, 1993), image understanding(Levitt, et al., 1990), financial securities trading (Breese,1987), and plan projection (Ngo et al, 1996).

This paper presents a knowledge representationframework to support automated model construction ofproblem-specific models from a knowledge baseexpressing generic probabilistic relationships. Most workon automated network construction takes as the unit ofknowledge a set of probabilistic influences on a singlevariable. That is, an element of the knowledge basespecifies a variable, some or all of its parents, andinformation used to construct its local distribution in the

Page 2: Network Fragments: Representing Knowledge for ...seor.vse.gmu.edu/~klaskey/papers/Fragments.pdfNetwork Fragments: Representing Knowledge for Constructing Probabilistic Models Kathryn

constructed model. For a number of reasons, it is usefulto have the capability to organize domain knowledge inlarger chunks. Domain experts often consider a relatedset of variables together. The ability to representconceptually meaningful groupings of variables and theirinterrelationships facilitates both knowledge elicitationand knowledge base maintenance (Mahoney and Laskey,1996). Also, larger situation specific models tend toinclude these conceptually meaningful groupings assubmodels. Thus, a model construction algorithm can bemade more efficient by searching for and instantiatingsubmodels over sets of related variables.

Our representation therefore takes as its basic unit thenetwork fragment, which consists of a set of relatedvariables together with knowledge about the probabilisticrelationships among the variables. We discuss hownetwork fragments can be combined to form largermodels for reasoning about a given problem instance.Our focus is on the representation of probabilisticknowledge as network fragments and not on algorithmsfor constructing models from the knowledge base.

2. MILITARY SITUATION ASSESSMENTThe application area for our work is the domain ofmilitary situation assessment. We give a brief descriptionof this application area, both to illustrate the complexitiesof the domain and to provide examples for later reference.

A military intelligence analyst is charged withconstructing a description of a military situation: who theactors are, where they are located, what they are doing,and what they are likely to do in the future. To do this,the analyst uses her knowledge of military doctrine andtactics, knowledge about the capabilities and limitationsof military forces and equipment, background informationabout weather and terrain, and reports about the currentsituation from various sources including radar, imagery,communications traffic, and human informants.Reasoning is performed at different levels of aggregation.For example, an SA6 surface-to-air missile regiment iscomprised of several batteries and a command post, andeach of these subunits is itself comprised of elements suchas launchers, reloaders and radars. For some purposes theanalyst may reason about a regiment in the aggregate; forother purposes she may reason about the individualsubunits (batteries and command post) comprising theregiment. The analyst must also reason about theevolution of the situation in time.

As an illustration, consider an analyst who has received areport R3 of a radar emission characteristic of a StraightFlush radar. The report is accompanied by an errorellipse which indicates a region within which the radarmay be located. A Straight Flush radar is characteristic ofa surface-to-air missile battery of type SA6. The analystconsiders her current situation model, focusing on thearea within the error ellipse of the report. She hadpreviously received an imagery report R2 indicating a unitof unidentified type within the ellipse. The analystconsiders the hypothesis that reports R3 and R2 refer to

the same unit. In addition, there was a prior report R1 ofa straight flush radar. The two error ellipses show littleoverlap. The analyst therefore considers it possible butunlikely that R1 and R3 came from the same unit at thesame location. The report R1 was received several hoursago, so the analyst considers whether the two reportscame from a single battery that moved during the timebetween the reports. Yet another possibility the analystconsiders is that the report came from a new, not yetobserved, SA6 battery. Under each of these possibilitiesfor the unit giving rise to the report, the analyst mustconsider the aggregation of the SA6 batteries in the regioninto regiments. Batteries in a regiment are typicallyspaced so that there is some overlap in the airspace theyare covering, and so that they provide the widest possiblearea of coverage. She also considers various possibilitiesfor the military target or region the regiment is defending.

This brief vignette covers only a small subset of thereports our analyst receives about the situation over thecourse of a day. Each report must be considered in thelight of her current view of the situation and used to refineher estimate of what is happening. She must reason notjust about the current situation but also about how it islikely to evolve. Her description of the situation providesinput to her commander, who must plan a course of actionto respond to what the opposing force is likely to do.

It is clear that a template model is inadequate for thisproblem. The number of actors of any given type is notstatic, but varies from situation to situation. A reasoningsystem must be capable of unifying reports with already-hypothesized units and/or hypothesizing new units, as thecurrent problem context demands. The relevant variablesfor reasoning about an actor depend on the type of actor itis. For example, the mode in which a radar emits is a keyvariable for inferring the activity of a surface-to-airmissile battery. However, this variable is simply notapplicable to units which have no radar. Clearly anetwork with a fixed set of variables and a fixed topologyis inadequate for this problem.

3. NETWORK FRAGMENTS

3.1 NETWORK FRAGMENTS AS OBJECTS

We have found it useful to express our representationframework in the language of object-oriented analysis(Rumbaugh, et al., 1991). An advantage of the object-oriented approach is the ability to represent abstract types.Objects of a given type share structure (commonattributes) and behavior (common methods). Anotherimportant feature is inheritance. Objects can be organizedin hierarchies of related objects. From an implementationviewpoint, this facilitates knowledge base developmentand maintenance. It is much easier to specify a newobject type, especially one similar to an existing objecttype, when much of its structure and behavior areinherited from its parent in the object hierarchy.Maintenance is simplified because changes to structure orbehavior need be made only at the level of the hierarchyat which the knowledge is specified, and automatically

Page 3: Network Fragments: Representing Knowledge for ...seor.vse.gmu.edu/~klaskey/papers/Fragments.pdfNetwork Fragments: Representing Knowledge for Constructing Probabilistic Models Kathryn

propagate to all objects inheriting from the changedobject. Another advantage of the object-orientedapproach is the ability to encapsulate private knowledgewithin an object. In a related paper, Koller and Pfeffer(1997) discuss the role of encapsulation in the design oflarge, complex belief network knowledge bases. Finally,objects provide a natural way to represent first-orderknowledge about families of problem-specific models.Object classes are used to represent generic knowledgeabout types of domain entity. In a given problemsituation, one or more instances of an object class may becreated to reason about particular entities of a given type.

In our framework, there are two basic categories ofobject: the random variable and the network fragment.Random variables represent the uncertain propositionsabout which the system reasons; network fragmentsrepresent probabilistic relationships between thesepropositions. Random variable and network fragmentclasses represent knowledge about generic domainentities. During problem solving, instances of theserandom variable and network fragment classes are createdin a model workspace to represent attributes of particulardomain entities and their interrelationships.

3.2 RANDOM VARIABLES

Random variables represent aspects of a situation aboutwhich the reasoner may be uncertain. Each randomvariable class has a set of identifying attributes, which arebound to particular values when an instance of therandom variable is created. For example, the randomvariable class (SA6 Battery Activity <Unit-ID> <t>)represents the activity of an SA6 battery. Its identifyingattributes are <Unit-ID>, which refers to the particularunit, and <t>, which refers to when the activity is takingplace. These variables are bound to particular valueswhen an instance is created to refer to a particularsituation.

Definition 1: A random variable is an object with thefollowing attributes and methods:

· Name. This is a unique name for the variable class.

· States. Our current representation assumes that arandom variable has a fixed finite set of possiblestates. This could be generalized to allow arandom variable to have an associated method fordetermining its state space for the context in whichit is instantiated.

· Identifying attributes. Each random variable has aset of identifying attributes. These attributes arebound to specific values when the random variableis instantiated.

· Influence combination . This is a method forconstructing the local distribution of the randomvariable from probability information contained inmultiple fragments. A commonly used example ofan influence combination method is the noisy-OR.Influence combination is discussed in more detailin Section 4 below.

· Default distribution. This is a method forassigning a distribution to the random variable bydefault when none is explicitly specified.

As is common with the term object, the term randomvariable may be used to refer either to a class or aninstance. When the intent is not clear from the context,the more specific term random variable class or randomvariable instance will be used.

3.3 ELEMENTARY FRAGMENTS

Network fragments organize sets of random variables andencode the probabilistic relationships among them. Theknowledge base designer encodes knowledge in the formof elementary fragment classes, which are instantiated andcombined during problem solving into compoundfragments. An elementary fragment is a modular,semantically meaningful unit of probability knowledge.Variables within the fragment are classified as resident orinput variables. Distributional information for residentvariables is represented within the fragment. Inputvariables are variables that condition resident variables,but whose distributions are carried external to thefragment.

In our domain it is important to be able to expressasymmetric independence, or independence relationshipsbetween variables that exist only for certain values ofother variables (cf., Geiger and Heckerman, 1991;Boutilier et al., 1996). Our framework generalizes theBayesian multinet, defining a multi-fragment as acollection of hypothesis-conditioned fragments thattogether specify the distribution of a set of residentvariables. Hypothesis-conditioned fragments expressknowledge about their input and resident variablesconditional on a subset of the state space of hypothesisvariables. Hypothesis-conditioned fragments allowparsimonious expression of independence relationshipsthat exist conditional on subsets of the hypothesisvariables, but not unconditionally.

A fragment has a set of associated identifying attributes,which map to the identifying attributes of its randomvariables. For example, Figure 1c is an instance of ahypothesis-conditioned fragment class for reasoningabout activity and dwell (length of time at a givenlocation) of a surface-to-air missile unit. The identifyingattributes of the fragment correspond to the unit identifierand the current and previous time periods, and are boundto the values <B654>, <0>, and <1> respectively. Theunit identifier attribute points to the correspondingattribute in each of the fragment's random variables. Thisconstrains all random variables in the fragment to refer tothe same unit.

Definition 2: An elementary hypothesis-conditionedfragment is an object with the following attributes andmethods:

· Random variables. Each random variableassociated with a fragment takes as its value arandom variable instance of a given random

Page 4: Network Fragments: Representing Knowledge for ...seor.vse.gmu.edu/~klaskey/papers/Fragments.pdfNetwork Fragments: Representing Knowledge for Constructing Probabilistic Models Kathryn

variable class. A nonempty subset R of thefragment's random variables is designated asresident variables; the remaining (possibly empty)set I of random variables is designated as inputvariables. A subset H of the input variables offragment F is designated as the hypothesis variableset.

· Hypothesized subset. A subset m of the Cartesianproduct of the state spaces of the hypothesisvariables is designated as the hypothesized subsetfor fragment F.

· A set of fragment identifying attributes and amapping from the fragment identifying attributes tothe identifying attributes of the fragment randomvariables. These identifying attributes play the roleof variables in a logic programming language.

· An acyclic directed graph G over I´R called thefragment graph, in which all nodes in I are rootnodes;

· An influence function for each variable in thefragment. The influence function is used by theinfluence combination method to compute a localdistribution for the variable.

· A local distribution for each resident variable inthe fragment. The local distribution represents aprobability distribution over the state space of thevariable given each combination of values of itsparents in G.

The local distribution need not be represented explicitly.When a fragment represents a partial influence andcontains only a subset of the parents of the variable, thelocal distribution attribute may be left unspecified or maycontain a default distribution to be used when only theparents mentioned in the fragment are included in theconstructed model. A model construction system neednot compute local distributions until it is ready to use themodel for inference.

4 FRAGMENT COMBINATION

4.1 INTRODUCTION

This section describes the process of combining fragmentinstances into larger models for reasoning about aproblem. Figure 1 shows an example of fragmentcombination for fragments used in reasoning about wherean SA6 battery is located and how long it will remain atthat location. Figures 1a and 1b focus on location quality.Location quality is important for inference about locationbecause units tend to be placed where location quality isgood. The fragment instances in Figure 1a and Figure 1brepresent the partial influence on location quality of thedegree to which a location supports the unitÕs mission andthe degree to which the location supports its activity.Both these influences are mediated by the unit's activity.These influences are combined with a conditional noisy-MIN influence combination, in which the influences ofthe two supportability variables combine by a leaky

noisy-MIN for each value of the activity variable. Figure1c is an instance of a fragment expressing knowledgeabout the unit's activity and how long the unit will remainat its present location. These fragments are combined intothe result fragment shown in Figure 1d.

4.2 INFLUENCE COMBINATION

When fragments are combined, local distributions for thecombined fragment are computed from the fragmentinfluence functions using the node's influencecombination method. The influence function for avariable in a fragment in which it is resident must providethe inputs needed by that variable's influence combinationmethod. Thus, influence combination and influencefunctions must be designed to work together. A numberof generic influence combination methods have appearedin the literature. We describe several common methodsbelow.

The most straightforward influence combination methodis Simple-Combination, which requires the variable X tobe resident in exactly one fragment containing all itsparents. The influence function for X computes its(possibly unnormalized) local distribution, and Simple-Combination simply normalizes this distribution. UsingSimple-Combination, it is straightforward to represent astandard Bayesian network over n variables X1, É, Xn asa set of n network fragments. Each fragment Fi hasexactly one resident variable, Xi. The input variables ofFi are the parents of Xi in the original Bayesian network.These fragments combine to yield the original Bayesiannetwork. Slightly more complex than Simple-Combination is Default-Combination, in which a defaultdistribution is overridden by a distribution defined for amore specific set of parent variables.

Another class of influence combination methods consistsof methods for combining partial influences. The mostcommonly cited partial influence models are theindependence of causal influence (ICI) models1, the best-known of which is the noisy-OR. For an ICI model, theinfluence function carries information about the partialinfluence of a subset of the node's parents. When severalfragments expressing partial influences are combined, thenode's influence combination method uses the partialinfluence information computed by each fragment'sinfluence function to compute a local distribution givenall the parents. The fragments of Figure 1a and 1b arecombined using a modified ICI method.

Another generic type of influence combination,Parameterized-Combination, occurs when, again, X isresident in a single fragment containing all its parents, butits distribution can be computed from some lowerdimensional representation. One such example is thesigmoid function (Jaakkola and Jordan, 1996; Neal,1992). When the set of influences is known in advance,partial influence models may also be represented using a

1 Independence of causal influence has also been called causalindependence (see Heckerman, 1993).

Page 5: Network Fragments: Representing Knowledge for ...seor.vse.gmu.edu/~klaskey/papers/Fragments.pdfNetwork Fragments: Representing Knowledge for Constructing Probabilistic Models Kathryn

single home fragment and Parameterized-Combination.The Parameterized-Combination influence functionreturns the parameters used to compute the localdistributions, and influence combination computes thelocal distribution from the parameters.

An influence combination method has a set of enablingconditions specifying requirements for applicability of themethod. The enabling conditions provide a way for thedesigner to specify conditions under which thecombination method applies. For example, all inputnodes to a noisy-OR must be binary, as must the node forwhich the distribution is being computed. As anotherexample, the method Simple-Combination completeswithout error only when the variable is resident in exactlyone fragment of the input set. Assuming that its enablingconditions are met, the influence combination methodcomputes the variable's local distribution using the resultsreturned by the variable's influence functions from theinput fragments in which it is resident.

Combining hypothesis-conditioned fragments requiresconditions involving consistency of their hypothesizedsubsets. Hypothesis-conditioned fragments are organizedinto multi-fragments (Section 5), which consist of a

partition over a set of hypothesis variables together with aset of hypothesis-conditioned fragments definingdistributions for resident variables given the hypothesisvariables. For this reason, an influence combinationmethod for a variable X takes as inputs not only thefragments whose distributions for X are to be combined,but also the partition element for which the ouptutdistribution is being computed. The following definitionsestablish terminology for the consistency conditionsinfluence combination is required to satisfy.

Definition 3: An hypothesis partition S=(H, D) is a setof variables H together with a partition D of the Cartesianproduct of the state spaces of variables in H. Anhypothesis element of the hypothesis partition is anelement n of D.

Definition 4: Let F be an hypothesis-conditionedfragment instance with resident variables RF, inputvariables IF, hypothesis variables HF and hypothesizedsubset mF. Let S=(H, D) be a hypothesis partition and letnÎD be a hypothesis element. The fragment F and thehypothesis element n are hypothesis variable consistent if:(1) HFÌH and (2) if XÎH and XÎ(IFÈRF) then XÎHF. Fsubsumes n if nÌmF . F is disjoint from n if nÇmF=Æ.

Unit-Type <B654>=

SA6 Battery

Locat ion<L723> Mission

Supportability

Location<L723> Quality

Act ivity<B654> <1>

a. Input Fragment 1

Locat ion<L723> Quality

Act ivity

<B654> <1>

Location

<L723> Activity

Supportability

b. Input Fr agment 2

Un it-Type <B654>=

SA6 Battery

<0<B654>Activity

c. Input Fragment 3

Statu s<B654> <1>

Dwell <B654>

Activity<B654> <1>

Un it-Type <B654>=

SA6 Battery Activity

<0><B654>

d. Result Fragment

Status<B654> <1>

Dwell <B654>

Activity<B654> <1>

Locat ion<L723> Quality

Cond it ionalNoisy-MIN

Location

<L723> ActivitySupportab ility

Location

<L723> MissionSupportability

Un it-Type <B654>=

SA6 BatteryResident variable

Input variable

Hypothes is variable

Figure 1: Example of Fragment Combination

Page 6: Network Fragments: Representing Knowledge for ...seor.vse.gmu.edu/~klaskey/papers/Fragments.pdfNetwork Fragments: Representing Knowledge for Constructing Probabilistic Models Kathryn

Hypothesis variable consistency simply means that thefragment and the hypothesis partition agree on whichvariables are designated as hypothesis variables. Allvariables in the hypothesis partition that appear in thefragment must be designated there as hypothesisvariables. Moreover, any variable designated in thefragment as a hypothesis variable must be included in H.The hypothesis partition of a multi-fragment is required tosatisfy hypothesis variable consistency with each of itscomponent hypothesis-specific fragments.

When a fragment subsumes n its hypothesized subsetcontains n, which implies that the fragment defines localdistributions for its resident variables given each state ofn. Each resident variable of a multi-fragment must beresident in a hypothesis-conditioned fragment subsumingn for each hypothesis element n of the multi-fragment'shypothesis partition. This condition ensures that completelocal distributions are specified for all resident variablesin the multi-fragment.

Finally, for each hypothesis-conditioned fragment F andeach hypothesis element n, F must either subsume n or bedisjoint from n . This condition ensures that if adistribution is defined by F for some states in n, then Fdefines distributions for all states in n. If this condition isnot satisfied by a set D of fragments and an hypothesispartition S=(H,D), there exists a refinement D ' of D forwhich it is satisfied.

Definition 5: Let X be a node and let S= (H, D) be anhypothesis partition. An influence combination methodfor X is a function which takes as input a set D ofhypothesis-conditioned fragments and an hypothesiselement nÎD, and which satisfies:

· An error is returned unless: (1) X is resident in atleast one fragment in D subsuming n ; (2) X isresident only in fragments in D which eithersubsume n or are disjoint from n ; and (3) theenabling conditions specific to the influencecombination method are satisfied.

· Otherwise, the function returns a set of parents forX and a local distribution for X.

· The parents returned for X are the variablescontaining arcs into X in the graph union of thefragment graphs for fragment instances in Dsubsuming n.

· The local distribution for X is computed using theinfluence functions for X from the fragmentinstances in which X is resident and that subsumen.

· The parents and local distribution returned for Xdepend only on those fragments in D in which X isresident and which subsume n.

The following definitions provide conditions under whicha set of fragments can be combined into a compoundfragment that unambiguously defines a probabilitydistribution over its resident variables given its inputs.

Definition 6: Let S=(H, D) be an hypothesis partition,nÎD an hypothesis element, and D a set of hypothesis-conditioned fragment instances. D is acyclic given n ifthe graph union of the fragment graphs for all fragmentinstances in D that subsume n contains no directed cycles.

Definition 7: Let X be a random variable instance, S=(H,D) an hypothesis partition, nÎD an hypothesis element,and D a set of hypothesis-conditioned fragmentinstances. D satisfies home fragment consistency for Xand n if the influence combination method for X returnswithout error for D and n.

Definition 8: Let S=(H, D) be an hypothesis partition,nÎD an hypothesis element, and D a set of hypothesis-conditioned fragment instances. Then D is globallyconsistent given n if the following conditions aresatisfied for each X resident in at least one fragment of Dthat subsumes n: (1) D is acyclic given n; (2) F and nhave consistent hypothesis partitions for each FÎD; (2) Dis home fragment consistent for X and n.

4.3 COMPOUND FRAGMENTS

A globally consistent set of fragment instances can becombined into a compound fragment as defined below.Compound fragments differ from elementary fragments inthat compound fragments have no influence functions oftheir own, but point to their component fragments wherethe influence functions reside. The local distribution for avariable in the compound fragment is computed by callingthe variable's influence combination method, which inturn calls the variable's influence functions from thecomponent fragments in which the variable is resident.Maintaining pointers to the component fragmentsfacilitates incremental model construction and permitscomputation of the local distributions to be deferred untilneeded by the inference algorithm.

Definition 9: Let S = (H, D) be a hypothesis partition, letnÎD be a hypothesis element, and let D be a globallyconsistent set of hypothesis-conditioned fragmentinstances. The compound fragment FD,n is an object withthe following attributes:

· Hypothesis variables H;

· Hypothesis element n;

· Resident variables R consisting of those variablesresident in at least one fragment in D;

· Input variables I consisting of variables that areinput to at least one fragment in D and resident inno fragment in D.

· Fragment graph consisting of an acyclic directedgraph defined as follows. The nodes in G are givenby IÈR, where I and R are defined above. Allnodes in I are root nodes. The parents of a node Xin R are the variables returned as parents by theinfluence combination method of X applied to D.

Page 7: Network Fragments: Representing Knowledge for ...seor.vse.gmu.edu/~klaskey/papers/Fragments.pdfNetwork Fragments: Representing Knowledge for Constructing Probabilistic Models Kathryn

· Component fragments consisting of all elementaryfragments in D together with all componentfragments of compound fragments in D.

· Local distribution for resident variables. Thesemay be left unspecified. If specified, the localdistribution for X is computed by applying theinfluence combination method for X to D.

It is clear from the above definition that fragmentcombination is order-independent. It may be useful whenmodels are constructed incrementally to permit theknowledge base designer to define incremental influencecombination methods. Incremental influence combinationwould compute the local distribution for a compoundfragment from the local distributions of input compoundfragments, together with the influence functions of inputelementary fragments.

5. MULTI-FRAGMENTSRepresenting knowledge as hypothesis-conditionedfragments is convenient when a different fragment graphstructure applies for different states of the hypothesisvariables. To represent such a model as a standardBayesian network or network fragment would require amore complex structure than the individual, simplerstructures associated with the subsets. For someproblems, knowledge representation, knowledgeelicitation, and data entry may be significantly simplifiedby the hypothesis-conditioned fragment representation.Most of the models in our current knowledge base arehypothesis-conditioned fragments, and many of theinteresting inference tasks require combining thesehypothesis-conditioned fragments into multi-fragments.For example, the imagery report R2 described in Section2 refers to a unit of unknown type. One possibility for theunitÕs type is an SA6 battery. The hypothesis-conditionedfragments of Figure 1 would be retrieved for reasoningabout the unitÕs activity and location under the hypothesisthat it is an SA6 battery as well as fragments for the otherpossibilities for the unitÕs type.

As in a Bayesian multinet, all resident variables in amulti-fragment must have distributions defined for allhypotheses in the multi-fragmentÕs hypothesis partition.In our domain, there are many variables that exist only forsome values of a hypothesis (e.g., radar mode, which isonly defined if the unit is a type which has a radar). Wehandle these variables by defining their state as thespecial state NA in hypothesis-conditioned fragments inwhich the variable is not defined.

Hypothesis-conditioned fragments may combined bymulti-fragment combination as defined below.

Definition 10: Let S=(H, D) be a hypothesis partitionand let D be a set of hypothesis-conditioned fragmentinstances that is globally consistent given n for eachhypothesis element nÎD. Then the multi-fragment withcomponent fragments D and hypothesis partition is anobject with attributes:

· Resident variables: the set R of variables residentin at least one fragment in D.

· Input variables: the set I of variables that are inputto at least one fragment in D and resident in nofragment in D.

· Fragment graph: an acyclic directed graph definedas follows. The nodes in G are given by IÈ R,where I and R are defined above. All nodes in I areroot nodes. The parents of a node X in R are allvariables returned as parents by the influencecombination method of X applied to D and n forsome hypothesis element n.

· Component fragments: all elementary hypothesis-conditioned fragment instances in D together withall component fragments of compound hypothesis-conditioned fragment instances in D.

· Local distributions: for each resident variable X alocal distribution may be represented explicitlywith the multi-fragment. If specified, it iscomputed by applying the influence combinationmethod for X to D and n for all hypothesiselements n.

A multi-fragment defines a probability distribution overits resident variables given its input variables. The multi-fragment representation permits a knowledge basedesigner to exploit asymmetric independencies in adomain to specify a set of interrelated, structurally simplesubmodels that together comprise a probability model fora domain. Generally, the variables appearing as residentvariables in a given multi-fragment will be ones for whichthe given partition of the hypothesis variables induces asimple network structure on the constituent fragments.Sometimes different partitions will induce simplestructures for different sets of child variables. When thisis the case, different multi-fragments may be defined overthese different sets of variables. Multi-fragments may becombined with other multi-fragments to form compoundfragments in a straightforward extension to Definition 9.

6. MODEL CONSTRUCTIONModel construction proceeds by retrieving fragmentclasses from a knowledge base, creating fragmentinstances in the model workspace, and combining themby the operations defined above. A model in the modelworkspace represents a complete probability model overits variables when the set of input variables is empty. Amodel is query complete for query Q = P(Xt|Xe=xe) if theevidence variables xe d-separate the target variables Xtfrom the input variables. The provision for defaultdistributions for input variables permits approximatereasoning using incomplete models, as needed foranytime model construction.

For knowledge bases encoding modularized templatemodels, model construction means selecting which partsof the template model to bring into the model workspace.Variables that are d-separated by observed variables fromtarget variables need not be explicitly represented. Some

Page 8: Network Fragments: Representing Knowledge for ...seor.vse.gmu.edu/~klaskey/papers/Fragments.pdfNetwork Fragments: Representing Knowledge for Constructing Probabilistic Models Kathryn

search algorithms involve computing or approximatingbounds on the influence of a variable to decide whetherthe computation involved in extending the model isjustified by the potential improvement in accuracy.

In our application, model construction involves additionalissues, among them the problems of data association,hypothesis management, and pattern replication. Dataassociation is the problem of deciding which domainentity a piece of evidence refers to. An example of dataassociation is reasoning about which already-hypothesized SAM unit, if any, should be associated withan intelligence report indicating a SAM unit. Hypothesismanagement is the problem of generating and pruninghypotheses about domain entities and theirinterrelationships. Pattern replication refers to the need tomake multiple copies of a model to refer to differentdomain entities or different instants in time for a temporalreasoning problem. Our representation framework wasdeveloped to support these model construction functions,although they are not treated in the present paper.

We have implemented a simplified version of thefragment combination operations of Section 4 in thePRIDEÒ system, developed a library of fragments for thesituation assessment domain, and are developing anobject-oriented database schema for our fragment library.

Acknowledgments

The research reported in this paper was sponsored byDARPA and the U.S. Army Topographic EngineeringCenter under contract DACA76-93-0025 to InformationExtraction and Transport, Inc. The authors extendgrateful acknowledgment to Tod Levitt, Daphne Koller,and three anonymous reviewers for helpful comments andsuggestions on earlier versions of this paper.

References

Breese, John S. (1987) Knowledge Representation andInference in Intelligent Decision Systems. Ph.D.dissertation, Department of Engineering-EconomicSystems, Stanford University.

Geiger, D., and D. Heckerman. (1991) Advances inProbabilistic Reasoning, in Uncertainty in ArtificialIntelligence: Proceedings of the Seventh Conference, B.d'Ambrosio, P. Smets, and P. Bonissone (eds), San Mateo,CA: Morgan Kaufmann.

Goldman, R.P. and Charniak, E. (1993) A Language forConstruction of Belief Networks, IEEE Transactions onPattern Analysis and Machine Intelligence, 15(3), 196-207.

Heckerman, D. (1993) Causal Independence forKnowledge Acquisition and Inference. In Uncertainty inArtificial Intelligence, Proceedings of the NinthConference. Morgan Kaufmann, San Mateo, CA.pp.Ê122-127.

Jaakkola, T. and Jordan, M. (1996) Fast learning bybounding likelihoods in sigmoid-type belief networks.

Advances in Neural Information Processing 8.Cambridge, MA: MIT Press.

Jensen, F. V. (1996) Bayesian Networks. New York:Springer-Verlag.

Boutilier, C., N. Friedman, M. Goldszmidt, and D. Koller,(1996) Context Specific Independence in BayesianNetwroks, In Artificial Intelligence, Proceedings of theTwelfth Conference. Morgan Kaufmann, San Francisco,CA. pp. 115-123.

Koller, D. and Pfeffer, A. (1997) Object-OrientedBayesian Networks, this volume.

Laskey, K.B., Stanford, S. and Stibio, B. (1994)Probabilistic Reasoning for Assessment of EnemyIntentions, under revision for IEEE Transactions onSystems, Man and Cybernetics.

Levitt, T.S., J.M. Agosta, and T.O. Binford (1989) ModelBased Influence Diagrams for Machine Vision, inM.Henrion, R.C. Shachter, L.N. Kanal and J.F. Lemmer,eds., Uncertainty in Artificial Intelligence 5, North-Holland, Amsterdam. pp. 371-388.

Levitt, T.S., T.O.Binford, and G.J. Ettinger (1990) UtilityBased Control for Computer Vision in R.D. Shachter.L.N. Kanal, T. S. Levitt and J.F. Lemmer, eds.,Uncertainty in Artificial Intelligence 4, North-Holland,Amsterdam. pp. 473-480.

Mahoney, S.M. and Laskey, K.B. (1996) NetworkEngineering for Complex Belief Networks, Uncertainty inArtificial Intelligence, Proceedings of the TwelfthConference. Morgan Kaufmann, San Francisco, CA. pp.389-396.

Neal, R. (1992) Connectionist Learning of BeliefNetworks. Artificial Intelligence 56, pp. 71-113.

Ngo, L., P. Haddawy, and J. Helwig. (1996) ATheoretical Framework for Context-Sensitive TemporalProbability Model Construction with Application to PlanProjection In Uncertainty in Artificial Intelligence,Proceedings of the Twelfth Conference. MorganKaufmann, San Francisco, CA. pp. 419-246.

Pearl, J. (1988) Probabilistic Reasoning in IntelligentSystems. San Mateo, CA: Morgan Kaufmann.

Pradhan, M., G. Provan, B. Middleton, and M. Henrion.(1994) Knowledge Engineering for Large BeliefNetworks, in Uncertainty in Artificial Intelligence:Proceedings of the Tenth Conference, R. Lopez deMantaras, and D. Poole (eds.), San Mateo: MorganKaufmann.

Rumbaugh J., Blaha, M., Premerlani, W., Eddy, F. andLorensen, W. (1991) Object-Oriented Modeling andDesign, Englewood Cliffs, NJ: Prentice-Hall.

Wellman, M.P., Breese, J.S. and Goldman, R.P. (1992)From Knowledge Bases to Decision Models, KnowledgeEngineering Review, 7(1), 35-53.