Efficient reasoning

Efficient Reasoning

RUSSELL GREINERUniversity of Alberta

CHRISTIAN DARKEN and N. IWAN SANTOSOSiemens Corporate Research

Many tasks require “reasoning”—i.e., deriving conclusions from a corpus of explicitlystored information—to solve their range of problems. An ideal reasoning system wouldproduce all-and-only the correct answers to every possible query, produce answers thatare as specific as possible, be expressive enough to permit any possible fact to be storedand any possible query to be asked, and be (time) efficient. Unfortunately, this isprovably impossible: as correct and precise systems become more expressive, they canbecome increasingly inefficient, or even undecidable. This survey first formalizes thesehardness results, in the context of both logic- and probability-based reasoning, thenoverviews the techniques now used to address, or at least side-step, this dilemma.

Categories and Subject Descriptors: I.2.3 [Computing Methodologies]: ArtificialIntelligence—Deduction and Theorem Proving—Answer/reason extraction, InferenceEngines, Probabilistic Reasoning; I.2.4 [Computing Methodologies]: ArtificialIntelligence—Knowledge Representation Formalisms and Methods—Bayesian BeliefNets, Rule-based Systems

General Terms: Performance, Algorithms

Additional Key Words and Phrases: Efficiency trade-offs, soundness/completeness/expressibility

1. INTRODUCTION

Many information systems use a corpusof explicitly stored information (a.k.a. a“knowledge base,” KB) to solve their rangeof problems. For example, medical diag-nostic systems use general facts about dis-eases, as well as the specific details ofa particular patient, to determine which

Russell Greiner gratefully acknowledges financial support from Siemens Corporate Research.Authors’ addresses: R. Greiner, Department of Computing Science, University of Alberta, Edmonton, AlbertaT6G 2E8, Canada, e-mail: [email protected]; C. Darken and N. I. Santoso, Siemens Corporate Re-search, 755 College Road East, Princeton, NJ 08540.Permission to make digital or hard copies of part or all of this work for personal or classroom use is grantedwithout fee provided that copies are not made or distributed for profit or direct commercial advantage andthat copies show this notice on the first page or initial screen of a display along with the full citation.Copyrights for components of this work owned by others than ACM must be honored. Abstracting withcredit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use anycomponent of this work in other works, requires prior specific permission and/or a fee. Permissions maybe requested from Publications Dept, ACM Inc., 1515 Broadway, New York, NY 10036 USA, fax +1 (212)869-0481, or [email protected]©2001 ACM 0360-0300/01/0300-0001 $5.00

diseases the patient might have, andwhich treatment is appropriate. Similarly,configuration and synthesis systems usetheir stored descriptions of various com-ponents, along with the specifications for aproposed device (VLSI chip, software pro-gram, factory, or whatever), to design adevice that satisfies those requirements.Scheduling and planning systems likewise

ACM Computing Surveys, Vol. 33, No. 1, March 2001, pp. 1–30.

2 R. Greiner et al.

synthesize schedules sufficient to achievesome specified objective.

In each case, the underlying systemmust reason—that is, derive conclusions(e.g., diagnoses, designs, schedules) thatare sanctioned by its knowledge. Typi-cally, an expert first, at “compile time,”provides a corpus of general backgroundfacts—about medicine, or types of com-ponents, etc. At “run time,” a user thenspecifies the details of a specific situation(e.g., the symptoms of a specific patient, orthe specifications of a particular desiredartifact), and then poses some specificquestions (Which disease? What should beconnected to what? . . .); the reasoning sys-tem then produces appropriate answers,based on its current KB which includesboth the specifics of this problem and thegeneral background knowledge. As thereis a potentially infinite set of possible situ-ations, these conclusions are typically notexplicitly stored in the KB, but instead arecomputed as needed. This computationis called “reasoning” (aka “derivation,”“deduction,” “inference”).

In general, we will identify a reasonerwith its symbolic knowledge base KB; theuser can pose queries χ to that reasonerand receive answers—e.g., that χ is trueor not. Section 2 motivates the use ofsuch symbolic knowledge-based reason-ers, and presents broad categories of suchsystems: logic-based (typically using Hornclauses) and probabilitic (using Bayesianbelief nets). It also argues that we shouldevaluate a reasoner based on its facility inanswering queries, using as quality mea-sures: correctness, precision, expressive-ness, and efficiency.

We clearly prefer a reasoner thatalways returns all-and-only the correctand precise answers, immediately, toarbitrary queries. Unfortunately, we willsee that this is not always possible(Section 2.4). Many implemented rea-soning systems, therefore, sacrificesomething—correctness, precision, orexpressiveness—to gain efficiency. Theremainder of this paper presents variousapproaches: Section 3 (resp., Section 4,Section 5) overviews ways of improv-ing worst-case efficiency by reducing

expressiveness (resp., by allowing im-precise answers, by allowing occasionalincorrect responses). Section 6 considersways of producing (expressive, precise,and correct) systems whose “average-case”efficiency is as high as possible. It alsodiscusses ways to produce a system withhigh average performance, where the“performance” measure is a combinationof these various criteria. Appendix Aprovides additional relevant details aboutbelief nets.

2. SYMBOLIC REASONERS

2.1 Why Symbolic Reasoners?

In general, the user will interact witha knowledge-based symbolic reasoner viatwo subroutines: Tell (KB, χ ) which tellsthe reasoner to extend its knowledge baseKB to include the new information χ ;and Ask (KB, χ ) which asks the reasonerwhether χ is true—here the reasoner’s an-swer will often convey other information(such as a binding, or a probability value)to the user [Levesque 1984].1

The underlying knowledge base is “sym-bolic,” in that each of its individualcomponents, in isolation, has “seman-tic content”—e.g., the KB may containstatements about the world, perhaps inpropositional logic, predicate calculus, orsome probabilistic structure (details be-low). (This is in contrast to, for exam-ple, knowledge encoded as a numericfunction, perhaps embodied in a neuralnet.2)

There are many reasons why such asymbolic encoding is useful:

1 To be completely general, we may also have to in-clude routines that retract some assertions, or in gen-eral revise our beliefs [Alchourron et al. 1985]; wewill not consider this issue here.2 In some neural net systems, like KBANN [Towelland Shavlik 1993], the nodes do have semantic con-tent, in that they refer to some event in the realworld. The link-weights, however, are not semantic—their values are set only to provide accurate pre-dictions. In general, there is no way to determinewhether the weight on the A 7→ B link should be 0.7vs 0.8 vs 0.0001, except in reference to the otherweights, and with respect to some set of specificqueries.

ACM Computing Surveys, Vol. 33, No. 1, March 2001.

Efficient Reasoning 3

Explanation: Many systems have to inter-act with people. It is often crucial thatthese systems be able to justify why theyproduced their answers (in addition, ofcourse, to supplying the correct answer).Many symbolic reasoners can conveytheir decisions, and justifications, to aperson using the terms they used fortheir computations; as these terms havesemantics, they are typically meaning-ful to that user.

Construction (as well as debuggingand maintaining): As the individualcomponents of the system (e.g., rules,random variables, conditional proba-bilities, . . .) are meaningful to peoplein isolation, it is relatively easy for anexpert to encode the relevant features.This semantics also helps an expertto “debug,” or update, a problematicknowledge base—here again a domainexpert can examine a single specificcomponent, in isolation, to determinewhether it is appropriate.

Handling partial information—or,not,exists, as well as distributions: Manyformalisms, including both ones thatadmit explicit reasoning and others,are capable of dealing with completeknowledge, corresponding to con-juctions and universally quantifiedstatements—e.g., “gender = female anddisease = meningitis”; or “everyonewith meningitis is jaundiced.” Inmany situations, however, we may onlyhave partial knowledge: “disease iseither meningitis or hepatitis”; or“some people with meningitis arenot jaundiced.” Most logics, includingany containing propositional logic, canreadily express this information, usingdisjunction and negation (“or”s and“not”s). Yet more expressive systems,such as predicate calculus, can also dealwith existentials. We may also want to-explicitly state how confident we are ofsome claim, perhaps using probabilities.

2.2 Broad Categories of Reasoning Systems

There are many standard formalisms forencoding semantic information, each withits associated type of reasoning. This

report will consider the following two ma-jor categories.3

1. (Sound and Monotonic) LogicalReasoning: This formalism assumes wehave precise discrete descriptions of ob-jects, and that we expect to obtain pre-cise categorical answers to the questionsposed. In particular, these systems canprovide assurances that its answers willbe “correct”—that is, if you believe the in-put facts, then you have to believe theconclusions.

In more detail: This category includespropositional logic and first order logic(predicate calculus), as well as higherorder logics [Genesereth and Nilsson1987; Enderton 1972; Chang and Lee1973]. As an example, a logical sys-tem may include facts like “Any womanof child-bearing age, with a distendedabdomen, is pregnant.” If we then assertthat a particular patient is female, is of thecorrect age, and has a distended abdomen,the reasoner can then conclude that she ispregnant.

This is because the known facts

F =

∀x Woman(x) ∧ CBA(x)∧ Dis Abd(x) =⇒ Pregnant(x)Woman(Wilma)CBA(Wilma)Dis Abd(Wilma)

logically entails Pregnant(Wilma), writtenF |= Pregnant(Wilma). A reasoning pro-cess `α is “correct” (here, aka “sound”) if,for any sets of propositions 8 and 6,

8 `α 6 ⇒ 8 |= 6that is, `α only allows a reasoner to con-clude “true” facts.

Typical logic-based systems use acollection of “inference rules” (possiblyaugmented with rewrite rules) to infernew statements from an existing KB. Ifeach of these rules is sound (read “truthpreserving”), then the resulting extendedKB ′ will be as correct as the initial KB.

3 As many readers may not be familiar with the sec-ond category, this report will provide more detailshere; see also Appendix.


4 R. Greiner et al.

In 1965, Robinson [1965] proved that onecan word any logical (first order) infer-ence in terms of resolution. The ongoingchallenge has been to find this proof asefficiently as possible—see Section 2.3.4

There are also many systems that useother sound deductive techniques, suchas natural deduction (THINKER [PELLETIER

1986], MIZAR [Rudnicki 1992] and ONTIC

[McAllester 1989]) and/or equational logic(OTTER [McCune and Wos 1997]). Notethat most of these systems also use res-olution, in addition to their more special-ized inference rules. Also, many of thesesystems are better viewed as “proofcheck-ers” rather than “theorem provers,” astheir main function is to verify that a pro-posed proof is legitimate. Hence, they hopeto gain efficiency by sharing the burdenwith a (hopefully insightful) person. Forbrevity, this report will not further discusssuch approaches.

Non-monotonic Logical Reasoning:Standard logical inference is monotonic,in that new information will never causethe reasoner to retract any conclusion.For example, after deriving that “Wilmanis pregnant” from our general medicalinformation together with informationspecific to Wilman, finding new informa-tion will not change this conclusion. Thisis not always appropriate, as subsequentinformation can compromise prior conclu-sions. (Imagine, for example, finding thatWilma had a hysterectomy.)

Note that this does not mean the earlierconclusion was inappropriate; given thatearlier store of knowledge, it was proba-bly the correct interpretation. It is useful,however, to allow the reasoner to changeits mind, given new data.

This intuitive concept of “nonmono-tonicity” is well motivated, and seemsessential to common-sense reasoning.While there is now a vast literatureon this topic, including work on de-fault reasoning [Reiter 1987], circum-scription [McCarthy 1980], autoepistemiclogics [Marek and Truszczynski 1989], as

4 See Genesereth and Nilsson [1987] for a generaldiscussion of these ideas—inference rules, sound-ness, resolution.

Table I. Joint ProbabilityDistribution

J B H P (J, B, H)

0 0 0 0.033950 0 1 0.00950 1 0 0.00030 1 1 0.18051 0 0 0.014551 0 1 0.0381 1 0 0.000451 1 1 0.722

well as many variants (see also Bobrow[1980] and Ginsberg [1987]), it has provenvery difficult to provide an effectiveimplementation.5 The major problem isunderstanding how to represent and usedefeasible statments. That is, while weknow how to deal with statements of theform “All birds fly,” it is not so clear howto deal with the claim that “A bird, by de-fault, flies”: here, we do want to concludeinitially that the bird Tweety will fly, butthen reverse this conclusion later, on find-ing that Tweety is a penguin, or has a bro-ken wing, or . . .

Fortunately, there are alternative ap-proaches, which have precise definitionsas well as implementations; see thedecision-theoretic system mentioned be-low. This survey will therefore not fur-ther discuss “non-monotonic reasoning”formalisms.

2. Probabilistic Reasoning: Manyforms of information are inherentlyprobabilistic—for example, given certainsymptoms, we may be 80% confidentthe patient has hepatitis, or given someevidence, we may be 10% sure a specificstock will go up in price.

One possible downside of dealing withprobabilities is the amount of informationthat has to be encoded: in general one mayhave to express the entire joint distribu-tion, which is exponential in the number offeatures; see Table I. For many years, thisobservation motivated researchers to seekways to avoid dealing with probabilities.

5 The most visible implementation is “Theorist”[Poole et al. 1987], which handles a (useful and intu-itive) subset of default reasoning.



Fig. 1 . Simple (but not minimal) belief network.

In many situations, however, therecan be more laconic ways to expresssuch information, by “factoring” the jointdistribution. This has led to “belief nets”(aka “causal nets,” “probability networks,”“Bayesian nets”), which over the lastdecade have become the representation ofchoice for dealing with uncertainty [Pearl1988; Shafer and Pearl 1990; Charniak1991].

To make this concrete, consider theclaims that Hepatitis “causes” Jaundiceand also “causes” a Bloodtest to be posi-tive, in that the chance of these symptomswill increase if the patient has hepatitis.We can represent this information, usingthe full joint over these three binary vari-ables (see Table I for realistic, if fabricated,numbers), then use this information tocompute, for example, P (h ¬ b)—the pos-terior probability that a patient has hep-atitis, given that he has a negative bloodtest.6 The associated computation,

P (h | ¬b) = P (h, ¬b)P (¬b)

=(∑

X /∈{H,B}∑

x∈X P (h, ¬b, X = x))(∑

X /∈{B}∑

x∈X P (¬b, X = x))

involves the standard steps of marginal-ization (the summations shown above) to

6 Note we are identifying a node with the associatedvariable. We will also use lower-case letters for valuesof the (upper case) variables; hence H = h means thevariable H has the value h. We will sometimes ab-breviate P (H = h) as P (h). Finally, “¬h” correspondsto h = 0, and “h” to h = 1.

deal with unspecified values of varioussymptoms, and conditionalization (the di-vision) to compute the conditional proba-bility; see Feller [1966]. In general, we willask for the distribution of the “query vari-able” (here H) give the evidence specifiedby the “evidence variables” (here B = ¬b).

A Belief Network would represent thisas a graphical structure, whose nodesrepresent probabilistic variables (such as“Hepatitis”), and whose directed links,roughly speaking, represent “causal de-pendencies,” with the understanding thatthere should be a directed path fromA to B, possibly a direct connection, ifknowing the value of A can help specifythe value for B. In particular, each nodeB has an associated “Conditional Proba-bility Table” (aka “CPtable”) that spec-ifies the conditional distribution for B,given every assignment to B’s parents; seeFigure 1.

(For general binary-valued variables,the CPtable for a node X with k parents{Yi}ki= 1 will include 2k rows—one for eachof the 2k possible assignments to EY =〈Y1, . . . , Yk〉—and 2 columns, one for eachpossible value for X . Here, the 〈i, j 〉 entryin this table will specify the conditionalprobability P (X = i | EY = Ej) where Ej rep-resents the j th assignment to EY (here Ej ∈{〈0, . . . , 0, 0〉, 〈0, . . . , 0, 1〉, . . . , 〈1, . . . , 1〉}).Note the final column of the table is su-perfluous as each row must add up to1. Of course, these ideas generalize togeneral `-ary variables. There is a similarapproach when dealing with continuousvariables [Pearl 1988].)


6 R. Greiner et al.

Fig. 2 . Corresponding, but minimal, belief network.

Notice, however, there is some “redun-dancy” in the network shown in Figure 1:given that the patient α has hepatitis, theprobability that α has jaundice does notdepend on whether α had a positive bloodtest—i.e., P(J = 1 | H = 1, B = 1) = 0.8 =P(J = 1 | H = 1, B = 0) and P (J = 1 | H = 0,B = 1) = 0.3 = P (J = 1 | H = 0, B = 0).This means that jaundice is independentof blood test, given hepatitis.

This reduction—called “factoring”—allows us to use a simpler network,shown in Figure 2.7 These factored rep-resentations include only the “relevant”connections: only include A 7→ B if Acan directly in fluence B. This meansthe resulting network typically requiresthe user to specify fewer links, andfewer parameters (CPtable entries), andmeans the inference process (discussed inAppendix A.2) can generally be performedmore efficiently. While the saving here isrelatively small (2 links rather than 3,and a total of 5 parameters, rather than7), the savings can be very significant forlarger networks. As a real-world example,the complete joint distribution for theAlarm belief net [Beinlich et al. 1989],which has 37 nodes and 47 arcs, wouldrequire approximately 1017 parametersin the naıve tabular representation—a la

7 This figure includes only the P (χ = 1 | . . .) entries;it omits the superfluous P (χ = 0 | . . .) columns of theCPtables, as these values are always just 1− P (χ =1 | . . .).

Table I. The actual belief net, however,only includes 752 parameters.8 Essen-tially all of the ideas expressed in thispaper apply also to related techniques forfactoring a distribution; see especially thework on HMMs [Rabiner and Juang 1986]and Factorial HMMs [Ghahramani andJordan 1997], and a description of howHMMs are related to belief nets [Smythet al. 1997].

Of course, not every distribution can befactored. We can still represent a nonfac-tored distribution using a belief net, al-beit one that uses the comprehensive setof parameters; e.g., 2k − 1 parameters ifall k variables are binary. That is, whilea belief net can exploit a factorable dis-tribution, this formalism does not forcea representation to be factored if that isinappropriate.

When we can combine this notionof probability with utility functions(which specify the “goodness” of thevarious possible outcomes), the resultingdecision-theoretic system can often ad-dress the issues of nonmonontic inferencediscussed above. In particular, there is

8 There are other tricks, such as “NoisyOr” represen-tation, that can further reduce the number of param-eters required [Pearl 1988]. For example, the well-known CPCS belief net would require 133,931,430parameters if dealing with explicit CPtables. (Notethis is still a huge savings over its unfactored table-form.) By using NoisyOr and NoisyMax’s, however,this network can be represented using only 8,254parameters [Pradhan et al. 1994].



nothing problematic about deciding thatactionA is the “optimal decision, givendata η,” but that actionB (which may“contradict” actionA) is appropriate givendata η+1. Also, while there are manyways to deal with uncertainty, etc.—including fuzzy logic, Dempster–Shafertheory of evidence, and even some formsof Neural Nets—we will focus on systemsbased on (standard) notions of probability[Feller 1966].

There are many obvious connectionsbetween the logic-based and probability-based formalisms. For example, an “exten-sion” to a knowledge base KB is a completeassignment of truth or falsity to each vari-able, such that the conjunction of these lit-erals entails KB. (Hence, the assignment{¬A, B, C, ¬D} is an extension of the the-ory KB0={A⇒ B, C ∨ D}, as ¬A∧ B∧C ∧¬D |= KB0.) We can in general view thepossible extensions of a knowledge baseas a “qualitative” version of the atomicevents (see Table I), with the understand-ing that each of these “possible words” hasa non-0 probability, while each of the re-maining possible “complete assignments”(e.g., {A, ¬B, ¬C, ¬D}) has probability 0 ofoccurring. Many, including Nilsson [1986],have provided formalisms that attempt tolink these areas.

Note that the standard models ofprobability—and hence, belief nets—canbe viewed as a natural extension of propo-sitional logic, as the fundamental unit be-ing considered is a proposition (e.g., theparticular subject is female, or is preg-nant). Predicate calculus is more generalthan propositional logic, as its basic unitis an individual, which may have cer-tain properties (e.g., perhaps the individ-ual Mary has the property female, but notthe property pregnant). There are severalgroups actively trying to extend probabil-ities to be more expressive; see Halpern[1990], Ngo and Haddawy [1995], Bacchuset al. [1996], Koller et al. [1997], Kollerand Pfeffer [1997], Poole [1993b]. That fas-cinating work is beyond the scope of thissurvey.

For more information about beliefnets, see Pearl [1988], or http://www.cs.ualberta.ca/∼greiner/BeliefNets.html.

2.3 Challenges of Reasoning

A reasoning system must address (atleast) the following two challenges: First,the system must somehow contain the rel-evant knowledge. There are two standardapproaches to acquiring the information:

Knowledge acquisition: acquire the rele-vant information by interviewing one ormore human domain experts; see Scottet al. [1991] for standard protocols andinterviewing techniques.

Learning: gather the required informationfrom “training data”—information thattypically specifies a set of situations,each coupled with the correct response[Mitchell 1997]

or, quite often, a combination of both[Webb et al. 1999].

In any case, the system builder must en-sure that the language used to encode theinformation is sufficient; e.g., if the colorof an object is important (e.g., to makingsome decision), then the language mustinclude something like a “colorOf(·, ·)”predicate. Even after deciding what to rep-resent (i.e., the relevant ontology), the de-signer must also decide how to representthis information—e.g., “colorOf(·, red)” vs“red(·).” There have been some efiorts tostreamline this process, and to standard-ize the relevant components; see the KIF

project, with its advocates [Neches et al.1991; Genesereth and Fikes 1992] and op-ponents [Ginsberg 1991]. There are alsomore fundamental encoding issues, suchas the decision to represent the world us-ing a conjunction of propositions (i.e., aknowledge base), as opposed to an equiv-alent set (disjunction) of characteristicmodels; see Kautz et al. [1993], Khardonand Roth [1994]. (This is related to thedual representations of a waveform: “timedomain” vs “frequency domain” [Bracewell1978].)

The first issue (“what to represent”) isclearly crucial: if the designer producesa representation that cannot express somecritical aspect of the domain, the rea-soner will be unable to provide effec-tive answers to some questions. The re-procussions of not adequately dealing


8 R. Greiner et al.

with the other issue (e.g., using the timedomain, rather than frequency; or viceversa) are not as severe, as everythingthat can be expressed one way can beexpressed in the other. However, the dif-ferent representations may differ in termsof their “naturalness” (i.e., people mayfind one more natural than another), and“efficiency.”

In this report, however, we assumethat the available information (both gen-eral and situation specific knowledge)is sufficient to reach the appropriateconclusion—although it may not be obvi-ous how to reach (or even approximate)that conclusion; see later. As such, wewill not directly consider the issues oflearning domain knowledge,9 nor will weexplicitly consider the related challengesof maintaining and updating this knowl-edge. This survey, instead, focuses on thesecond major challenge: efficiently produc-ing all-and-only the correct answers to allrelevant queries.

2.4 Quality Measures

We would like a reasoning system that is

correct: always returns thecorrect answer

precise: always returns themost specific answer

expressive: allows us to expressany possible piece ofinformation, and askany possible question

efficient: returns those answersquickly.

see Greiner and Elkan [1991], Doyle andPatil [1991].

Unfortunately, this is impossible: firstorder logic is not decidable. In particular,no “sound” and “complete” (read “correctand precise”) reasoning system can be de-cidable for a representation as expressiveas arithmetic [Nagel and Newman 1958;Turing 1936].

There are also hardness results for lessexpressive systems: e.g., general proposi-

9 Although we will later briefly consider learning con-trol knowledge; see Section 6.1

tional reasoning is N P -complete [Gareyand Johnson 1979] (in fact, #P -hard[Valiant 1979]), as is probabilistic reason-ing in the context of belief nets [Cooper1990]. Moreover, even getting approxi-mate answers from a belief net, within anadditive factor of 1/2, is N P -hard [Dagumand Luby 1993], as is getting answers thatare within a multiplicative factor of 2n1− ε

for any ε >0 [Roth 1996].We can view this as general property of

reasoning:

Fundamental Trade-off: The worst-case run-time efficiency of any correct-and-precise reasoning process increasesmonotonically with the expressivenessof the reasoner’s language.

Any system that wants to guaranteeefficient reasoning must therefore sacri-fice something—expressiveness, precisionor correctness. The next three sectionsconsider the possibility of improving theworst-case efficiency by (resp.) reduc-ing expressiveness, allowing imprecise an-swers, and allowing occasional incorrectresponses; Section 6 then considers pro-ducing (expressive, precise, and correct)systems whose “average-case” efficiencyis as high as possible. Of course, com-plete precision may be overkill for sometasks; e.g., to decide on our next action,we may just need to know whether ornot P (cancer) > 0.5; here additional pre-cision will not be additionally useful. Ingeneral, we can identify each task witha performance criteria, then evaluate areasoning system based on this crite-ria. Section 6.3 addresses this range ofissues.

3. IMPROVING WORST-CASE EFFICIENCYBY REDUCING EXPRESSIVENESS

This section discusses reasoning sys-tems that reduce expressiveness to obtainguarantees of efficient performance.

Less Expressive Logic-Based Rea-soners: Standard “database managementsytems” (DBMS) are very inexpressive, asthey allow only conjunctions of positive



ground atomic literals [van der Lans1989]. These systems do allow users tostate “McD makes FrenchFries” and “Armymakes Tanks,” and to answer questionsthat correspond to existentially quanti-fied boolean combinations of such atomicliterals. However, they do not allow theuser to explicitly state claims of theform “McD makes either FrenchFries orTanks,” “McD makes something” nor “McDdoes not make Tanks”—that is, typicalDBMS do not allow disjunctions, nega-tions, or existentials [Reiter 1978a]. Theupside is that database “reasoning” (i.e.,answering standard SQL queries) isefficient—at worse linear (in the size of thedatabase).

Two comments are relevant here: First,linear efficiency may seem optimal, as ittakes this much time simply to input theinformation. However, a clever reasoningsystem may be able to do better at query(run) time (i.e., in the Ask routine), if it hasfirst done some appropriate work when theinformation was asserted (i.e., at “compiletime”), via Tell. As an extreme, imagine aDBMS that explicitly stores the answersto all allowed queries, after all assertionshave been entered; given a sufficientlynarrow space of possible queries and asufficiently good indexing scheme, a rea-soner could answer queries in time con-siderably less than O(n)—this can be ex-tremely important in practice, as linearcomplexity might still be too expensive forlarge databases. We will later return to thegeneral issue of when to perform inference(Section 6.1).

Second, many database systems em-body implicit assumptions that extendthe set of queries that can be answeredcategorically. In particular, the “ClosedWorld Assumption” allows a DBMS toconclude that “McD does not make Tanks”from a database that does not explicitlyinclude the assertion “McD makes Tanks”[Reiter 1978b]; this is a special case ofthe “Negation As Failure” policy of manylogic programs [Clark 1978]. The “UniqueNames Assumption” allows a DBMSto conclude that “McD makes (at least)two products” from a database that in-cludes “McD makes Hamburgers” and “McD

makes FrenchFries” [Reiter 1980] as thisassumption allows us to conclude thatHamburgers 6= FrenchFries from the factthat their names are distinct;10 see alsoReiter [1987]. Note that these assump-tions extend the set of queries that can beanswered; they do not extend the informa-tion that the user can express, as (gener-ally) the user does not have the option ofnot expressing these assertions.

“Semantic Nets” and “Frame-based Sys-tems” are alternative artificial intelli-gence representation formalisms. In hind-sight, we can view much of the researchin these areas as guided by the objectiveof producing an efficient, if inexpressive,reasoning system [Findler 1979]. Thesesystems in general allow only conjunc-tions of atomic unary- or binary-literals,as well as certain subclasses of simple 1-antecedent rules using such literals. Themore recent work on “Terminological Log-ics” (aka “Description Logics”) is explicitlytrying to formalize, and extend, such sys-tems, towards producing a system whoselanguage is as expressive as possible, withthe constraint that its worst time capac-ity must remain polynomial [Donini et al.1997]. This relates to the notion of “com-plexity cliffs” [Levesque and Brachman1985; Levesque 1984]: one can keep in-creasing the expressiveness of a language(by adding new “logical connectives,” or in-creasing the number of disjuncts allowedper clause, etc.) and retain worst-casepolynomial complexity, until reaching acliff—one more extension produces a sys-tem in which inference is NP-hard. Thenone can add a set of additional connec-tives, etc., until reaching the next cliff,where the complete system goes fromdecidable (if NP-hard) to undecidable. SeeFigure 3.

10 Note that this claim is not always true: e.g., “2+2”and “4” are the same, even though they have difier-ent names; similarly both “Professor Greiner” and“Russ” refer to the same thing; as do “MorningStar”and “EveningStar” [McCarthy 1977]. Hence, if thereare k entries of the form McD makes xi , the ClosedWorld Assumption allows a reasoner to conclude thatMcDonalds makes at most k items, and the UniqueNames Assumption, that McDonalds makes at leastk items.


10 R. Greiner et al.

Fig. 3 . Complexity cliff (taken from Figure 5.3 of Poole et al. [1998]). (Reprintedwith permission of Oxford University Press.)

As mentioned above, resolution issufficient to answer any logic-basedquery. PROLOG is a specific embodimentof the “resolution” derivation process,which is honed to deal with a certainclass of knowledge bases and queries. Inparticular PROLOG deals only with “Hornclauses”—i.e., knowledge bases that canbe expressed as conjunctions of disjunc-tions, where each disjunction includesat most one positive literal [Clocksinand Mellish 1981]. To motivate this re-strictions, note that, while propositionalreasoning is NP-hard, there is a lineartime algorithm for answering queries froma Horn database [Dowling and Gallier1984].11 In exchange for this potentiallyexponential speed-up, however, there are

11 Moreover, PROLOG uses a type of “ordered res-olution”; ordered resolution is refutation completefor Horn clauses [Genesereth and Nilsson 1987].Also, Boros et al. [1990] and Dalal and Etherington[1992b] provide yet other syntactic situations forwhich reasoning is guaranteed to be efficient.

statements that cannot be expressed inPROLOG—in particular, one cannot statearbitrary disjuncts, e.g., that “Patient7has either Hepatitis or Meningitis.”Moreover, PROLOG cannot prove that a(existentially quantified) query is entailedunless there is a specific instantiation ofthe variables that is entailed. Note this isnot always the case: consider a tower of3 blocks, with the green-colored block Aimmediately above B, and B immediatelyabove the red-colored block C; see Figure 4.

Fig. 4 . Reasoning by cases.



Now observe that the answer to the ques-tion: “Is there a green block immediatelyabove a nongreen block?” is yes, as thisholds whether B is green (and hence thegreen B is above the nongreen C) or B is notgreen (and hence the green A is above thenongreen B) [Moore 1982]. Fortunately, inpractice, these limitations are not thatsevere—very few standard tasks requiresuch “reasoning by cases” and implicitly-specificed answers.12

There are also constaint programmingsystems [Freuder 1996] that attempt toaccomodate a larger subset of clauses—ofcourse, such systems cannot provide theefficiency guarantees that a Horn-clausesystem can.

Less Expressive Belief Nets, Prob-abilistic Queries: We now consider thework on probabilities and belief nets, fo-cusing on the “belief updating” task: i.e.,computing P (H = h | E1 = e1, . . . Em =em), the posterior probability that the hy-pothesis variable H has value h, condi-tioned on some concrete evidence, of theform Cough = true, Temp = 100◦, . . . .13

We note first that there are efficient in-ference processes associated with certaintypes of structures, and for certain typesof queries. Inference is trivial for “naıve-bayes” classifiers, of the form shown inFigure 5. These n+ 1-node belief nets weremotivated by the task of classification:i.e., assigning a “classification label” to aninstance, specified by a set of (up to n)attribute values. Each such net includesone node to represent the “classification,”which is the (only) parent of each of theother nodes (the “attributes”). While infer-

12 Also, as PROLOG does not perform an “occurscheck,”it is not sound—i.e., it can return an answer that isnot correct. This too is for efficiency, as it means theunification procedure, running in the innermost loop,is O(k), rather than O(k2), where k is the size of thelargest literal [Genesereth and Nilsson 1987]. PROLOG

also includes some “impurities,” such as negation as-failure “not(·)” and cut “!”.13 Many other standard tasks—such as computingthe maximum a posteriori assignment to the vari-ables given some concrete evidence—require simi-lar algorithms, and have similar computational com-plexity [Dechter 1998; Abdelbar and Hedetniemi1998].

Fig. 5 . Example of “Naıve Bayes” structure.

ence is guaranteed to be fast O(r) (wherer ≤n is the number of specified attributes;e.g., P (H = 1 |O1 = o1, . . . , Or = or )), thesesystems cannot express any general de-pendencies between the attributes, as itsstructure forces P (Oi |O j , H) = P (Oi |H)for all i 6= j .

There are also efficient algorithms forinference in the more general class of“poly trees”—i.e., belief nets that includeat most a single (undirected) path con-necting any pair of nodes [Pearl 1988];see Appendix A.1. Notice this class strictlygeneralizes tree structures (and a fortiori,naıve-bayes structures) by ignoring thedirections of the arrows, and by allow-ing more complex structures (e.g., allow-ing multiple root nodes, longer paths, etc.).However, these are still many dependen-cies that cannot be expressed.

Friedman et al. [1997] combine the ideasof naıve-bayes and poly-tree structures, toproduce “Tree Augmented Bayesian Net”or TAN, structures, that are typicallyused to classify unlabled instances. Thesestructures resemble naıve-bayes trees,but allow certain dependencies betweenthe children (read “attributes”). To definethese TAN structures: (1) There is alink from the single classification nodedown to every other “attribute” node. (2)Let BN ′ be the structure obtained byremoving these classification-to-attributelinks from the initial TANetwork. ThisBN ′ then is a poly-tree. See Figure 6.14

We close our comments about effi-cient structures with a few quick com-ments: (1) All three of these classes (naıvebayes, poly-tree, and TANetworks) can be

14 To see why there is an efficient inference algo-rithm, just observe that a TAN structure has a singlenode cut-set; see Appendix A.2.2.



Fig. 6 . A TAN model: dashed lines are from Naıvebayes; solid lines express correlation between attri-butes (taken from Figure 3 of Friedman et al. [1997]).(Reprinted with permission of Kluwer Ac. Pub.)

learned efficiently [Chow and Lui 1968], aswell as admiting efficient reasoning. (Thisis a coincidence, as there are classes of net-works that admit efficient inference butare hard to learn, and vice versa.) (2) Manysystem designers, as well as learning algo-rithms [Singh 1998; Sarkar and Murthy1996] use the heuristic that “networkswith fewer arcs tend to be more efficient,”as an argument for seeking networks withfewer connections.15 Of course, belief netinference can be NP-complete even if nonode in the network has more than 3 par-ents [Cooper 1990]. (3) There are also effi-cient algorithms for reasoning from someother simple structures, such as “similar-ity networks” [Heckerman 1991].

These positive results deal with gen-eral queries, where only a subset of thepossible conditioning variables havebeen set. There are also efficient waysto compute P (H | E1= e1, . . . , Em= em)from any belief net, provided the evidenceset {Ei} is comprehensive—i.e., includesall variables, or all variables other thanH. Actually, it is easy to computeP (H | E1= e1, . . . , Em= em), from a gen-eral belief net, if the evidence {Ei} includesof H ’s parents and none of H ’s descen-dents; or if {Ei} includes H ’s “Markovblanket”: that is, all of H ’s parents, H ’schildren and H ’s “co-parents” (all of thenon-H immediate parents of all of H ’simmediate children—e.g., refering to Fig-ure 6, C and Pregnant are co-parents of Age;and C and Age are co-parents of Insulin).

15 Note this point is orthogonal to the goal of learningmore accurate networks by using a regularizing termto avoid overfitting [Heckerman 1995].

By contrast, the most efficient knownalgorithms for answering general queriesfrom general belief nets are exponential.The algorithms based on “junction tree”(aka clique-tree—cf., bucket elimination[Dechter 1998]) are exponential in the net-work’s “induced tree width”—a measureof the topology of the network [Arnborg1985]. Another class of algorithms is basedon “cut set elimination.” Here, the com-plexity is exponential in the “min cut” (ford-separation) of the network. AppendixA.2 provides a quick summary of the majoralgorithms for computing posterior proba-bilities, using a belief net.

Summary: This section has consideredseveral types of reasoners; each alwaysproduces correct and precise answers, overthe situations it can accomodate: that is, ifyou can state the appropriate facts, andpose the appropriate question, then thesystem will produce all-and-only the cor-rect answers, efficiently. There are, how-ever, limitations on what can be statedand/or asked.

The next sections present other reason-ers that attempt to remain as expressiveas possible, but hope to gain efficiency bybeing imprecise or occasionally incorrect,etc.

4. IMPROVING WORST-CASE EFFICIENCYBY ALLOWING VAGUE ANSWERS

In general, a reasoning algorithm pro-duces an answer to each given query.This answer is correct if it follows fromthe given knowledge base. Note thata correct answer can still be vagueor imprecise. For example, givenP (Hep | Jaundice)= 0.032, the answer“P (Hep | Jaundice)∈ [0, 0.1]” is correct,but less precise. At the extreme, theanswer “P (Hep | Jaundice)∈ [0, 1],” whileso vague as to be useless, is not wrong.Similarly, answering a propositionalquery with “IDK” (for “I don’t know”) isnot incorrect; this vague answer is, inmany situations, better than arbitrarilyguessing (say) “No.” As a less extremesituation, a system may that answers thequestion “What is the disease?” with “abacteria” is correct, but less precise than



stating “enterobactericaiea”; similarly acorrect (if vague) answer to “Who are theStan’s uncles?” could be “7 adult men,all living in Kansas”; or perhaps “atleast 2 adult men, and at least oneliving outside California"; etc.

Precision, like correctness, is relativeto a given knowledge base. For exam-ple, stating that “Fido IsA Dog” is pre-cise if that is all that the KB sanctions.However, this answer is imprecise withrespect to a more specific KB that en-tails “Fido IsA Chihuahua.” This is espe-cially true for systems that deal with“qualitative reasoning” [Weld and de Kleer1990] or “qualitative belief nets” [Well-man 1990]. In general, we say an answeris precise (with respect to a knowledgebase KB and query) if it is as specificas possible; and otherwise is consideredimprecise (or vague, approximate). Notethat this issue of precision is orthog-onal to correctness—an answer can beprecise, but wrong; e.g., imagine statingP (Hep | Jaundice)= 0.8 when it is actually0.7. Of course, in some situations (correctbut) vague answers may be sufficient—e.g., you may only need to know whetherP (StockX goes up | · · ·)> 0.5 to decide onyour next action; Section 6.3 explores thistopic. In this section, we will consider rea-soners that always return correct, but per-haps vague, answers.

To state this more precisely: In general,let α be an answer to the query ϕ; in thepredicate calculus situation, this may cor-respond to ϕ[β] for some binding β, orperhaps ϕ[β1]& · · ·&ϕ[βn] if the user wasseeking all answers. In the probabilisticcase, where ϕ is P (A | B), α could be the“P (A | B) = 0.4” or “P (A | B) ∈ [0, 0.5].”Assume the correct, precise answer forthe query ϕ is ϕ∗; this means know thatKB |= ϕ∗. The answer α is correct, but notnecessarily precise, if |= α∗ ⇒ α (where αmust also satisfy some syntactic require-ments, to insure that it relates to the ϕquery).16 We view IDK as the tautology

16 Technically, we should write KB− |= ϕ∗ ⇒ α whereKB− contains the information required to connect thenotation in ϕ∗ to α; e.g., to prove that A = 0.3 impliesA ∈ [0, 0.4].

Fig. 7 . Flow diagram of P S (S, W ) addressing6 |=? τ .

“true or false”; notice this answer is al-ways correct. We can also consider a rel-ative measure of precision, saying that α1is more precise than α2 if |= α2 ⇒ α1; infact, this induces a partial ordering. Notefinally that a correct answer α is consid-ered precise if also |= α ⇒ ϕ∗.

Imprecise Logical Reasoners: Manyimprecise reasoners work by removingsome (hopefully inconsequential) distinc-tion from the knowledge base and thequery, in order to simplify the reason-ing process. This has lead to the largebody of work on approximation, often cou-pled with the related notion of abstrac-tion [AAAI 1992; Ellman 1993; Sacerdoti1973].

One specific example is a “Horn Approx-imation” reasoner [Selman and Kautz1996]. These systems are motivated bythe observation that, while inference froma general propositional theory is NP-hard,there are efficient algorithms for reason-ing from a Horn theory. Now note that wecan always bound a non-Horn theory 6by a pair of Horn theories 〈S, W 〉. (E.g.,given 6 = {a ∨ b, ¬b ∨ c}, S = {a¬b ∨ c}is a “stronger (Horn) theory” while W ={¬b∨ c} is a “weaker (Horn) theory.”) Thereasoner, called P S (S, W ) in Figure 7,can use this pair 〈S, W 〉—called a “Hornapproximation to 6”—to answer 6 |=? τ :It first asks whether W |=? τ , and returns



“Yes” if that query succeeds. Otherwise, itasks whether S |=? τ , and returns “No”if that query fails. If neither test pro-duces the definitive answer (i.e., if W |=/ τand S |= τ ), the reasoner simply returnsthe imprecise “IDK”; see Figure 7.17 Notethat the answers are always correct (e.g.,W |= τ implies 6 |= τ and S |=/ τ implies6 |=/ τ ); but they are not always as pre-cise as the answers that would arise fromthe original 6. (That is, if W |=/ τ and S |=τ , then the query “6 |= τ ” would returna categorical answer, but P S (S, W ) doesnot.) There are many challenges here—e.g., there are many “maximal” strength-enings (which are NP-hard to find), andthe optimal weakening can be exponen-tially larger than the initial theory. Sec-tion 6 below discusses an approach to ad-dress these issues.

We can also consider stochastic algo-rithms here. One recent, prominent exam-ple is the GSAT algorithm [Selman et al.1992], which attempts to solve satisfiabil-ity problems by hill-climbing and plateau-walking in the space of assignments, start-ing from a random initial assignment.That is, the score of an assignment, fora fixed SAT formula with m clauses,18 isthe number of clauses that are satisfied;note this score ranges from 0 to m. At eachstage, given a current assignment, GSATsequentially considers changing each indi-vidual variable of that assignment. It then“climbs” to the new assignment with thehighest score, and recurs. Here, if GSATfinds an satisfying assignment (i.e., an as-signment with the score of m), it correctlyreports that the problem has a solution.As GSAT is unable to determine if theSAT instance has no solution, it will ter-minate after some number of iterationsand random walks and return the incon-clusive “I don’t know.” As such, GSAT isa “LasVegas algorithm” [Hoos and Stutzle1998], which knows when it knows the

17 Dalal and Etherington [1992a] discuss variousextensions to this framework.18 Each such formula is a specific conjunction of mclauses, where each clause is a disjunction of literals,where each literal is either a Boolean variable or itsnegation [Garey and Johnson 1979].

answer, and so only returns the correctanswer, or is silent.

It is worth commenting that, despiteits inherent incompleteness, GSAT hasproven extremely useful—being able tosolve problems that no other current algo-rithm can solve. It, and related ideas, havesince been used, very successfully, in plan-ning research [Kautz and Selman 1996].

Imprecise Probabilistic Reasoners:As mentioned above, a probabilistic rea-soner that returns only P (Hep | Jaundice)∈ [0, 0.10], when the correct answer isP (Hep | Jaundice)= 0.032, is correct, butimprecise. Unfortunately, for a given beliefnet, even finding approximately correctanswer is hard; e.g., as noted above, get-ting an answer within an additive factor of1/2, is NP-hard [Dagum and Luby 1993],as is getting answers that are within amultiplicative factor of 2n1−ε

for any ε >0[Roth 1996].

There are, however, many systems thataddress this approximation task. Here weinclude algorithms that return answersthat are guaranteed to be under-bounds,or over-bounds, of the correct value, as weview an underbound of p as meaning theanswer is guanteed to be in the interval[p, 1]; and an overbound of q means it isin [0, q].

Many approximation techniques arebest described relative to the algorithmused to perform exact probabilistic infer-ence, in general. (1) In Dechter and Rish[1997], Dechter and Rish modify theirBucket Elimination algorithm (see Ap-pendix A.2.1) to provide an approximateanswer; they replace each bucket’s func-tion with a set of “smaller” functions thateach include only a subset of the functionsassociated with each variable. Those re-sults hold for discrete variables. Jaakkolaand Jordan [1996a, b] use a similaridea for continuous (Gaussian) variables:their algorithm sequentially removes eachunassigned variable, replacing it with anapproximation function that computes anunder-bound (resp., over-bound) to the cor-rect function.

(2) Horvitz and others have similarly“tweaked” the idea of context of cut-set



conditioning (see Appendix A.2.2): Ratherthan consider all values of the cut-set vari-ables, they get an approximation that isguaranteed to be within an additive factorof ε away from the correct value by sum-ming over only those values whose prob-ability mass collectively exceeds 1− ε; seeHorvitz et al. [1989].

Other approaches work by removingsome aspect of the belief net—either node,or arc, or values of a variable. The LPE(Localized Partial Evaluation) algorithm[Draper and Hanks 1994] maintains andpropagates intervals during the inferenceprocess, and so can compute a range forthe posterior probability of interest. LPEcan disregard a variable (perhaps becauseit appears distant from the query nodes,or seems relatively irrelevant) by settingits value to [0, 1]. (In practice, this meansthe inference algorithm has fewer nodesto consider, and so typically is faster.)See also the work by Srinivas [1994]and Mengshoel and Wilkins [1997], whoalso consider removing some nodes fromnetworks, to reduce the complexity of thecomputation.

Others attempt to reduce complexityby removing some arcs from a network.In particular, Kjaerulff [1994] proposessome heuristics that suggest which arcsshould be removed, with respect to theclique-tree algorithm; see Appendix A.2.1.Van Engelen [1997] extends this idea byproviding an algorithm that applies toarbitrary inference algorithms (not justthe clique-tree approach) and also al-lows, in some cases, many arcs to be re-moved at once. He also bounds the error|PNr→/ Ns

(H | E) − P (H | E) | obtained byusingBNr→/ Ns

rather than (the correct)B to compute this conditional proability.(Here, BNr→/ Ns

is the network obtained byremoving the Nr → Ns arc from the net-work B, PNr→/ Ns

(H | E) is the probabil-ity value returned by this network, andP (H | E) is the result obtained by the orig-inal network.)

The Wellman and Liu approximation al-gorithm [Liu and Wellman 1997; Wellman1994] leaves the structure of the belief netunaffected; it reduces the complexity of in-ference by instead abstracting the state of

some individual variables—e.g., changinga variable that ranges over 10 states toone that only ranges over (say) 4 values, bypartitioning the set of values of the origi-nal variable into a smaller set of subsets.

A related approach reduces the com-plexity of inference by disregarding valuesof variables which are relatively unlikely[Poole 1993a]. This reduction is especiallylarge when the relevant variable(s) occurearly in a topological ordering used by thereasoner; e.g., when each variable occursbefore all of its children.

Recently, Dagum and Luby [1997]proved that networks without extremevalues can be approximated efficiently.They begin with the observation that thehardness proofs for approximation tasksall involve CPtables that include 0s and1s—which, in essence, means the networkis being used to do something like logicalinference. In general, if all of the valuesof the CPtables are bounded away from 0(i.e., if all entries are greater than someλ>0), then there is a sub-exponential timealgorithm for computing a value that isguaranteed to be within a (bounded) mul-tiplicative factor of optimal—i.e., an algo-rithm that returns an answer v such that(1− ε)v≤ v≤ (1+ ε)v where v is the cor-rect answer and ε is a parameter suppliedby the user, in time O(poly(1/ε)2(ln n)O(d )

)where d is the “depth” of belief net andn is number of nodes.

Comments: There are two obviousways an approximate reasoner can inter-act with the user: First, the user can setsome precision bounds, and expect the rea-soner to return an answer that satisfiesthis requirement—although perhaps thereasoner will “charge” the user for this,by requiring more time to return an an-swer that is more precise. Of the resultsshown above, only Dagum/Luby has thisproperty. The others, instead, all returnboth an answer and the bounds, where thebounds specify how precise that answeris. If the resulting answer/bounds are notsufficient, the user can often adjust someparameters, to get another answer/boundpair, where the precision (read “bound”) ispresumably better.



Note that all of these systems improvethe efficiency of the underlying computa-tion, but produce an answer that is an onlyan approximation to the correct value.There are also a body of tricks that cansometimes reduce the average complexityof answering a query, but at no degrada-tion of precision; we mention some suchtricks in Section 6.

5. IMPROVING WORST-CASE EFFICIENCYBY ALLOWING INCORRECT RESPONSES

Of course, not all reasoners provide an-swers that are always correct; some mayoccasionally return erroneous answers.We can evaluate such not-always-correctreasoners R using a distance measured (aq , R(q)) ∈ R≥0 that computes the dis-tance between the correct answer aq to thequery q and the response that R returnedR(q). (Note this d (a, b) function could be1 iff a 6= b, or 0 otherwise; such a func-tion just determines whether R returnedthe correct answer or not.) R ’s “worst-case” score will be largest such valueover all queries q, maxq{d (aq , R(q))}; andits “average score” is the expected value,Eq[d (aq , R(q))] = 6q P (“q” asked) ×d (aq , R(q)), over the queries encountered.To evaluate a stochastic reasoner, we alsoaverage over the answers returned for anysetting of the random bits.

We will subdivide these not-correct(aka “unsound”) systems into two groups:those which are deterministic (each timea query is posed, it returns the samepossibly-incorrect answer) versus thosewhich are stochastic—i.e., may returndifferent answers to same query, on differ-ent calls.

As one example of a well-motivated de-terministic unsound algorithms, consideranytime algorithms [Zilberstein 1993;Zilberstein 1996]: While addressing a spe-cific problem, these systems are requiredto produce an answer whenever requested,even before the system has run to comple-tion. (Note this is appropriate if the rea-soner really has to return an answer, todecide on some action before a hard dead-line. Of course, “no action” is also a de-cision.) Here, these systems may have to

guess, if they have not yet computed theappropriate answer. As another example,consider the variant of Horn approxima-tion (see Figure 7) that returns Yes, ratherthan IDK, if the definitive tests bothfail.

We will, however, focus on stochas-tic algorithms, where the reasoner usessome source of random bits during itscomputation.

Stochastic Logical-Reasoning Algo-rithms: As one obvious example, we canmotify GSAT to return “NOT-satisfiable”(rather than “IDK”) if no satisfying as-signment is found. Once again, this is notnecessarily correct. If this algorithm hasperformed a sufficient number of restarts,however, we can still be confident that“NOT-satisfiable” is correct.

Stochastic Probabilistic-ReasoningAlgorithms: There is a huge inventory ofalgorithms that stochastically evaluate abelief net. The simplest such algorithm,“Logic Sampling,” appears in Figure 8.This algorithm simply draws a numberof samples (from the distribution associ-ated with the belief net), and computes theempirical estimate of the query variable:of the times that the conditioning eventEE = Ee occured (over the instances gener-

ated), how often was X equal to xi, for eachxi is X ’s domain.

This algorithm uses GETRANDOM (BN).To explain how this subroutine works,consider the BNW G network shown inFigure 9. Here, GETRANDOM(BNW G) would. . .

(1) Get a value for “Cloudy,” by Flipping0.55-coinAssume flip returns “Cloudy = True”

(2) Get a value for “Sprinkler,” by Flip-ping 0.1-coin (as Cloudy = True,P (S | C = T) = 0.1)Assume flip returns “Sprinker =False”

(3) Get a value for “Rain,” by Flipping0.8-coin (as Cloudy = True, P (R | C =T) = 0.8)Assume flip returns “Rain = True”



Fig. 8 . Logical sampling.

(4) Get a value for “WetGrass,” by Flipping0.85-coin (as Sprinkler = F, Rain =T, P (W | ¬S, R) = 0.85).Assume flip returns “WetGrass =True”

Here, GETRANDOM(BNW G) would returnthe instance

〈Cloudy = True, Sprinker = False, Rain= True, WetGrass = True〉.

On other calls, GETRANDOM(BNW G) wouldsee different results of the coin-flips, andso return other instances.

Note that the LOGICSAMPLE algorithmwill, in essense, ignore an instance Er ifEr[ EE] 6= Ee. (Here Er[ EE] corresponds to thetuple of the values of the variables in EE).This is clearly inefficient, especially if EE =Ee is a rare event.

The LWSAMPLE algorithm (for Like-lihood Weighted Sampling), shown inFigure 10, avoids this problem: Theroutine it uses to generate samplesGETLWRANDOM(·) is similar to GET-RANDOM(·), except that it insists thateach generated instance Er has Er[Ei]= eifor each evidence variable Ei. However,while GETRANDOM(BNW G) gave eachinstance a score of 1 (i.e., observe the

M += 1 and mi += 1 in Figure 8), the newGETLWRANDOM(BN, EE =Ee) routine willinstead use a “weight” of p= P ( EEi =Eei | EUi = Eui), where EUi are EEi ’s parents, andEui is the current assigment to EUi.

For example, to estimate P (WetGrass |Rain), from BNW G (Figure 9), the associ-ated GETLWSAMPLE would. . .

(1) Get a value for “Cloudy,” by Flipping0.55-coinAssume flip returned “Cloudy =False”

(2) Get a value for “Sprinkler,” by Flip-ping 0.5-coin (as Cloudy = False,P (S | C = T)= 0.5)Assume flip returned “Sprinker =True”

(3) Now for “Rain”Note that this is an evidence variable;so set it to the required value, True.As Cloudy = False, P (R | C = F)= 0.1So this “run” counts as p = 0.1

(4) Get a value for “WetGrass,” by Flip-ping 0.99-coin (as Sprinkler = T,Rain = T, P (W |S, R) = 0.99)Assume flip returned “WetGrass =True”



Fig. 9 . Simple Belief Net BNW G (adapted from Pearl [1988] and Russell andNorvig [1995]).

Here, the associated tuple is〈Cloudy = False, Sprinker = True,Rain = True, WetGrass = True〉

and the probability p = 0.1; LWSAMPLE

would therefore increment both M andmW G=T by 0.1.

There are many other stochastic al-gorithms that have been used for be-lief net inference—e.g., based on Impor-tance Sampling, Monte Carlo sampling[Pradhan and Dagum 1996], etc.—as wellas algorithms that combine stochastic al-gorithms for some parts, with exact rea-soning for other nodes. Many of these aredescribed in Cousins et al. [1993].

Note, however, that adding “random-ness” does not seem to help, in theworst-case, as it is probably hard to findanswers that are, with high probability,approximate correct: no polynomial-timealgorithm can generate randomized ap-proximations of the probability P (X =x | E = e) with absolute error ε <1/2and failure probability δ <1/2, unlessNP⊂ R P [Dagum and Luby 1993].

6. IMPROVING AVERAGE EFFICIENCY

Note that the specific hardness resultsmentioned above are “worst case”—i.e.,there is a parameterized class of decisionproblems that exhibits this exponentialscaling [Garey and Johnson 1979]. Whatif these particular problematic problemsare not actually encountered in practice?In general, we would like to implement a

system whose average efficiency is as highas possible, where “average” is with re-spect to the distribution of tasks actuallyencountered. The first two subsectionstherefore consider ways to improve a(logic-based, probabilistic) reasoner’saverage efficiency. Section 6.3 extendsthese ideas to consider ways to im-prove the average performance, where“performance” is a user-defined functionthat can inter-relate correctness, pre-cision and efficiency, each weighted asappropriate to the task.

The approaches described below relateto the ideas discussed in Section 3, ofsimply defining away such problematictasks—i.e., using reasoners that can onlyhandle a subclass of tasks that excludesthese expensive situations. The differenceis in how the decision (of which reasonerto use) was reached. There, the decisionwas based (implicitly) on the hope thatthe problematic situations will not occur;here instead the system knows somethingabout which situations will occur, and soemploys algorithms designed to do well forthese situations.

If the system designer knows a lot aboutthe way the eventual reasoner will beused, that designer could engineer in thespecific algorithm that works best for thissubspace of tasks. For example, if the de-signer knows that the attribute values areconditionally independent (given the clas-sification), it makes sense to build a naıve-bayes classifier; similarly, if the systemnever needs to reason by cases, the PROLOG



Fig. 10 . Likelihood weighted sampling.

inference process is typically quite appro-priate. Alternatively, the reasoner could“learn” information about this distribu-tion of tasks, enough to identify a reasoner(perhaps from a set of parameterized rea-soning algorithms) that performs best; seeGreiner [1996]. Below we will considerboth situations—where the designer does,versus does not, know the performancetask.

6.1 Efficient (on Average) LogicalReasoners

In general, a logic-reasoning system mustdecide, at each step, which inference orrewrite rule to use, and on which setof statements. Even resolution-based sys-tems, which consider only a single in-ference rule (and so have no choice inthat respect), must still decide whichclauses to select. While general resolu-tion schemes (such as “set of support”or “lock” [Genesereth and Nilsson 1987])may constrain this selection, there canstill be many possible options. The logical-reasoning community has therefore inves-tigated a variety of techniques for improv-ing the performance of logic-reasoningsystems, often by providing some explicitmeta-level control information for control-ling the inference process; cf., Smith andGenesereth [1985], Smith et al. [1986].This is clearly related to the work bythe database community on join ordering,magic sets, and the like [Swami and Gupta1988; Azevedo 1997], and the large body ofheuristics used to solve “Constraint Satis-faction Problems” [Kondrak and van Beek1997].

Each of these optimizations is specific toa single problem instance, and many canbe viewed as a preprocessing step, calledbefore beginning to solve the problem.Note, however, that these modificationsdo not afiect the underlying reasoningsystem; and in particular, if that rea-soner encountered the exact same prob-lem a second time, it would presumablyfollow the same (preprocess and) processsteps.

As this preprocessing step is typicallyvery expensive, another body of work per-forms only a single “preprocessing” stepprior to solving a set of queries (ratherthan one such step before each query).These systems can be viewed as modify-ing the underlying reasoner, to produce anew reasoner that does well (on average)over the distribution of queries it will en-counter. Note this resulting reasoner willthen solve each single query directly, with-out first performing any additional query-specific preprocessing step.

To make this more concrete, consider aresolution system that uses a clause order-ing 2 to specify when to use which clauseduring the proof process: here, it attemptsto resolve the current subgoal (determinedby the depth-first order) with the (untried)clause at the highest position in 2. (Thiscan correspond to PROLOG’s control strat-egy, when the ordering is determined bythe chronological order of assertion.) Ofcourse, the time required to find a solutionto a specific query depends critically on theclause-ordering 2 used. A useful modifi-cation, therefore, would therefore changethe clause ordering to one whose averagetime, over the distribution of queries, is



minimum. The resulting reasoner wouldthen use this (new) fixed order in answer-ing any query.

In general, the system designer may notknow, at “compile time,” what queries willbe posed. However, given that the rea-soner will be used many times, a learner19

may be able to estimate this distributionof tasks by watching the (initial) reasoneras it is solving these tasks. The learner canthen use this distributional informationto identify the algorithm that will workbest. Here, the overall system (includingboth reasoner and learner) would have topay the expense of solving problems, usingstandard general methods for the first fewtasks, before obtaining the information re-quired to identify (and then use) the bestspecial purpose algorithm.

This has lead to the body of work on“explanation-based learning” (EBL): aftersolving a set of queries, an EBL systemwill analyse these results, seeking waysto solve these same queries (and relatedones) more efficiently when they are nextencountered; see Mitchell et al. [1986],Greiner [1999]. As the underlying opti-mization task is NP-hard [Greiner 1991],most of these systems hill-climb, from aninitial strategy to a better one [Greiner1996; Gratch and Dejong 1996].

There are other tricks for improving areasoner’s average efficiency. Recall thatour interface to the reasoner is via theTell and Ask subroutines. In PROLOG, theTell routine is trivial; the Ask routinemust do essentially all of the reasoning,to answer the query. Other systems(including those built on OPS [Brownstonet al. 1986], including XCON [Barkeret al. 1989]) do the bulk of the inferencein a “forward chaining” manner—here,the Tell operator will forward-chain toassert various newly-entailed proposi-tions. Answering certain questions istrivial in this situation, especially if weknow that the answer to the query will beexplicitly present iff it is entailed. Treiteland Genesereth [1987] consider “mixed”

19 Note this learner is learning about the usage pat-terns of the reasoner, to help in providing control in-formation, and not about the domain per se.

systems, where both Ask and Tell sharethe load: When their system is Telled anew fact, it will forward-chain, but onlyusing certain rules and to a certain depth,to establish a “boundary” in the associated“inference graph.”20 Afterwards, when theAsk process is answering some query, itwill backward chain, but only followingcertain rules, and only until just reachingthe “boundary” set by the earlier forward-chaining steps. In general, a “scheme”specifies which rules are used in a forward-chaining, vs backward-chaining, fashion,etc. Finding the optimal scheme, whichminimizes the total computational time(of both forward-chaining and asserting,and backward chaining and subgoaling),is challenging—as it requires knowing(or at least estimating) the distributionof what queries will be asked, as well asthe distribution of the new informationthat will be asserted; and then (typically)solving an NP-hard problem.

This is similar to the work on cachingsolutions found during a derivation, to im-prove the performance on later queries.Chaudri and Greiner [1992] present anefficient algorithm, for a certain classof rule-sets, that specifies which resultsshould be stored, as a function of the fre-quency of queries and updates, the costs ofstorage, etc.

6.2 Efficient (on Average)Belief Net Inference

Most BN inference algorithms performtheir inference based only on the spe-cific belief net involved. Some systemswill also examine the specific queryin a “preprocessing step,” and modifytheir algorithms and datastructureaccordingly—e.g., remove nodes “d-separated” from the query and evidencenodes [Pearl 1988] (including “barrennodes” [Zhang and Poole 1994]), findan ordering for the process that is ap-propriate given the nodes that will beinstantiated [Dechter 1998] and so forth.

20 This hyper-graph has a node for each proposition,connected by hyperlinks, each of which connects theconclusion of a rule to the rule’s set of antecedents.



As mentioned in the earlier logic-based situation, such preprocessing can beexpensive—indeed, it could involve solv-ing the underlying inference process, orsome other NP-hard problem. An efiectiveinference process may therefore, instead,try to find an inference procedure thatis optimal for the distribution of queriesit will encounter. Here, it will do somemodification once, rather than once perquery.

Herskovits and Cooper [1991] considercaching the answers to the most commonqueries; here, the reasoner can simply re-turn these explicitly stored answers whenthose particular queries are posed; other-wise, the reasoner will do a standard beliefnet computation. (Those researchers usedan analytic model, obtained from the be-lief net itself, to induce a distribution overqueries, and used this to determine whichqueries will be most common.)

The QUERYDAG work [Darwiche andProvan 1996; Darwiche and Provan 1997]takes a difierent cut: after the designerhas specified the belief net queries thatthe performance system will have to han-dle, the QUERYDAG system “compiles” theappropriate inference system, to producean efficient, well-honed system, that is ef-ficient for these queries. (Even if this com-piled version performed the same basiccomputations that the basic inference al-gorithm would perform, it still avoids therun-time look-ups to find the particularnodes, etc. In addition, other optimizationsare possible.)

The approach presented in Delcher et al.[1996], like the Treitel/Genesereth [1987]idea, does some work at data-input time(here “absorbing evidence”) as a way toreduce the time required to answer thesubsequent queries (i.e., compute the pos-terior probability), in the context of poly-trees: rather than spending O(1) time toabsorb evidence and then O(N ) time tocompute the answer to a query (from anN-node polytree), their algorithm takesO(ln N ) time to absorb each bit of evi-dence, then O(ln N ) time to compute theprobabilities.

As another fact: Appendix A.2 showsthere are several known algorithms {Q Ai}

for answering queries from a BN . Whileone such algorithm Q AA may be slow for aparticular query, it is possible that anotheralgorithm Q AB may be quite efficient forthe same query. Moreover, difierent BNscan express the same distribution, andhence provide the same answers to allqueries. Even if a specific Q Ai algorithmis slow for a given query when using onenet BNα, that same algorithm may be ef-ficient for this query, if it uses a different,but equivalent net BNβ .

We might therefore seek the “most ef-ficient” 〈B, Q A〉 combination—i.e., deter-miner which algorithm (with which parame-

ters) we should run,r on which belief net (from the set of equiv-alent BNs)

to minimize the expected time to answerqueries, where this “expected time” is av-eraged over the distribution of queriesthat will be encountered. (Hence, this taskcorresponds to the query-based model pre-sented in Greiner et al. [1997], but dealswith efficiency, rather than acuracy.)

The search for the best algorithm/encoding does not need to be “blind,” butcan incorporate known results—e.g., wecould avoid “CutSet conditioning” if weshow that it will always require morecomputations than any of the junctiontree algorithms [Dechter 1996]; we canalso use the empirical evaluations byLepar and Shenoy [1998], which sug-gests that the Shafer–Shenoy algorithm ismore efficient than the Hugin algorithm,and that both improve on the originalLauritzen–Speigelhalter.

6.3 Improving Average Performance

The bulk of this paper has implicitly as-sumed we want our (expressive) reasonerto be as precise and correct as possi-ble, while minimizing computational time.As noted above, an agent may need farless than this: e.g., it may be able toact efiectively given only a correct butvague answer (e.g., whether some proba-bility is >1/2 or not), and it may survive



if the answers it receives are correctat least (say) 90% of the time. Simi-larly, it may only need answers within 1minute; getting answers ealier may not beadvantageous, but getting then any, later,may be disasterous.

This motivates us to seek a reasonerwhose “performance” is optimal, where“performance” is perhaps some combina-tion of these {correctness, precision, ex-pressiveness, efficiency} criteria. Thereare two obvious challenges here: The firstis determining this appropriate perfor-mance measure—i.e., defining a functionthat assigns a score to each reasoner,which can be used to compare a set of dif-ferent reasoners (to decide which is best),and also to evaluate a single reasoner (todetermine whether it is acceptable—i.e.,above a threshold). Of course, this criteriais extremely task-dependent.

As discussed in Greiner and Elkan[1991], one approach is to first define ageneral “utility function” for a reasonerand query, that may combine the vari-ous criteria into a single score. One spaceof scores correspond to linear combina-tions of precision, correctness, expressive-ness and time; e.g., for a reasoner R andquery q,

u(R, q) = νPrec ×mPrec(R, q)+ νTime ×mTime(R, q)+ · · ·

where each mχ (R, q) measures the rea-soner R ’s χ feature when dealing with thequery q, and each νχ ∈ R is a real value.The quality measure for a reasoner wouldthen be its average utility score, over thedistribution Dq of queries,

U (R) = Eq∈Dq [u(R, q)]

We would then prefer the reasoner thathas the largest score. (We could, alter-natively, let this individual query scorebe a combination of thresholded val-ues. First associate with each query aset of “tolerances” required for time,precision, etc.—e.g., perhaps the “toler-ance query” q=〈P (fire | smoke, ¬alarm),± 0.25, ≤ 2, . . .〉 means we need to know

the probability of fire, given the evidencespecified, to within ±0.25, and get theanswer within 2 seconds, etc. The utilityscore a reasoner R receives for this queryq could then be defined as a weighted sumof precision and time: e.g.,

u(R, q) = ηPrec × δ[mPrec(R, q) ≤ 0.25]

+ ηTime × δ[mTime(R, q) ≤ 2]+ · · ·

where δ[· · ·] is 1 if the condition is met,and 0 otherwise. Or we could use a morecomplicated function, that inter-relatesthese quantities perhaps using a morecomplicated combination function, etc. Weassume the system designer has pro-duced such a function, probably based ondecision-theoretic considerations.

The second challenge is to identify thereasoner that is best, given this qualitymeasure. As noted above, this will de-pend on the distribution of queries posed.While this may be unknown initially, wecan use the standard techniques of sam-pling queries from the underlying distri-bution of queries to estimate the rele-vant information. Unfortunately, even ifwe knew the distribution precisely, weare left with the task of finding the rea-soner that is optimal, with respect to thisscoring function. In many situations, thisis NP-hard or worse; see Greiner [1991],Greiner and Schuurmans [1992]. The PALO

learning system [Greiner 1996] was de-signed for this situation: This algorithmhill-climbs in the space of these reason-ers, while collecting the samples it needsto estimate the quality of these reason-ers. Greiner [1996] proves, both theoret-ically and empirically, that this algorithmworks efficiently, in part because it onlyseeks a local optimal, exploiting the localstructure.

7. CONCLUSION

Many applications require a reasoner, of-ten embedded as a component of somelarger system. To be effective, this rea-soner must be both reliable and effi-cient. This survey provides a collection oftechniques that have been (or at least,should be) used to produce such a reasoner.



Fig. 11 . Simple non-poly-tree belief network.

Of course, this survey provides only asampling of the techniques. To keep thepaper relatively short, we have had to skipmany other large bodies of ideas. For ex-ample, we have not considered techniquesthat radically change the representation,perhaps by reexpressing the informationas a neural net [Towell and Shavlik 1993],or by reasoning using the characteristicmodels of the theory [Kautz et al. 1993;Khardon and Roth 1994], rather than thetheory itself.

We also chose to focus on techniques spe-cific to reasoning, and so by-pass the hugeinventory of techniques associated withimproving the efficiency of computations,in general—including clever compilationtechniques, and ways to exploit parallel al-gorithms. Needless to say, these ideas arealso essential to producing reasoning algo-rithms that are as efficient as possible.

To summarize: we have overviewed alarge variety of techniques for improv-ing the effectiveness of a reasoning sys-tem, considering both sound logical rea-soners (focussing on Horn clauses), andprobabilistic reasoners (focussing on be-lief nets). In general, these techniquesembody some (perhaps implicit) trade-off, where the system designer is will-ing to sacrifice some desirable prop-erty (such as expressiveness, precision,or correctness) to increase the system’sefficiency.

We also discussed the idea of combiningall of these measures into a single qualitymeasure, and pointed to an algorithm that

can “learn” the reasoning system that isoptimal with respect to this measure.

APPENDIX: REASONING USINGBELIEF NETS

A.1 Why is BN Inference Hard?

In general, probailistic inference involvecomputing the posterior distribution ofsome query variable Q conditoned onsome evidence EE = Ee—i.e., computingP (Q | EE = Ee). Pearl [1988] provided alinear-time algorithm for belief net infer-ence for networks that have a poly-treestructure—where there is at most one(undirected) path connecting any pair ofnodes. This algorithm simply propagatesa few numbers from each node X to its im-mediate neighbors (children and parents),and then recurs in the simpler structureobtained by deleting this X .21 Unfortu-nately, this idea does not work in generalnetworks, which can contain (undirected)cycles. The basic problem, here, is thatthese (to be deleted) variables may inducedependencies among the other variables,which can make a difference in places faraway in the network. To see this, considerasking for the (unconditional) probabilitythat C is true P (C= t) from the networkshown in Figure 11. Here, a naıve (but in-correct) algorithm would first see that Z

21 The actual algorithm runs in two phases: first go-ing from roots down to the leaves, and then goingfrom the leaves up to the roots.



Fig. 12 . Poly-Tree BN equivalent Figure 9 (adapted from Pearl [1988] and Rus-sell and Norvig [1995]).

is true half of the time, which means thatA and B are each true half of the time (asA is the same as Z and B is the same as¬Z ), which suggests that C, which is trueonly if both A and B are true, is true withprobability 1

2 × 12 = 1

4 .This is clearly wrong: A is true only

when B is false, and vice versa. Hence,there is 0 chance that both A and Bare true, and hence C is never true—i.e.,P (C = t) = 0. The issue, of course, isthat we cannot simply propagate informa-tion from Z , and then forget about thissource; there is instead a dependency be-tween A and B, induced by their commonparent Z . A related, but more complex, ar-gument shows that such reasoning is, infact N P -hard: here we can encode an ar-bitrary 3SAT problem by including a nodethat represents the boolean formula, con-nected to nodes that represent the clauses(with a CPtable that insures that the for-mula is true iff all of the clauses is true),and the clause-nodes are each connected tonodes that represent the boolean variables(with CPtable that insure that the clausenode is true iff the associated booleanvariables have the correct setting). Thenthe formula has a satisfying assignmentiff the associated formula-variable has anunconditional probability strictly greaterthan 0; see Cooper [1990].

A.2 Probabilistic Inference, UsingBelief Networks

There are two major categories of algo-rithms for computing posterior probabil-ities from general belief nets. Both rely on

the observation that there is an efficientalgorithm for computing arbitrary pos-terior probabilities for a poly-tree struc-ture, and (perhaps implicitly) use thisalgorithm as a subroutine.

A.2.1 Clustering. The “clustering” trickinvolves converting the given belief net-work into an equivalent network that isa poly-tree, by merging various sets ofnodes into associated “mega-nodes”—e.g.,consider transforming the network in Fig-ure 9 to the network in Figure 12. Notethat this new structure may be expo-nentially bigger than the original beliefnet, as the CPtables for the mega-nodecan be huge: if formed from the nodesN ={N1, . . .Nk}, it will include a row foreach combination of values fromN , and so(if each Ni is binary), it will have 2k rows.(So while BN-inference will be linear inthe size of the new poly-tree network, thatsize will be exponential in the size of theinitial belief network.)

There can be many ways to form suchclusters; see Lauritzen and Spiegelhalter[1988] as well as implentation details forthe Hugin system [Andersen et al. 1989;Jensen et al. 1990]. The general al-gorithm first “moralizes” the network(by connecting—aka “marrying”—theco-parents of each node) then triangulatesthe resulting graph, to form G ′. It thenforms a “junction tree” T = 〈NT ,AT 〉—atree structure whose nodes each corre-spond to the maximal cliques in G ′, andwhose arcs a = 〈n1, n2〉 ⊂ AT are eachlabeled with the nodes in the intersectionbetween the G ′-nodes that label n1 and



Fig. 13 . Example of cutset conditioning (adapted from Pearl [1988] and Russell and Norvig[1995]).

n2. The algorithm then uses, in essense,the poly-tree algorithm to produce ananswer to the original query.

This approach is also called the “junc-tion tree” algorithm, or “clique tree” al-gorithm. See Lepar and Shenoy [1998]for a specification of three of these algo-rithms, Lauritzen–Spiegelhalter, Hugin,and Shenoy–Shafer.

Bucket Elimination: There are sev-eral simpler ways to understand this basiccomputation, including Li/D’Ambrosio’sSPI [Li and D’Ambrosio 1994], Zhang/Poole’s algorithm [Zhang and Poole 1994]and Dechter’s bucket elimination [Dechter1998]. We will focus on the third, whichperforms belief updating by storing theCPtables in a set of buckets—where theith bucket contains just those CPtablesthat involve only variables whose largestindex is i. (Note that this algorithmrequires an ordering of the nodes.) It thensequentially eliminates (read “marginal-izes away”) each variable X i, updatingthe remaining buckets appropriately,by including in the appropriate (lower)bucket the marginals left after removingthe dependencies on X i.

Dechter shows that this relatively-straightforward algorithm is in fact do-ing the same computation as the generalclustering algorithm, and has the sameworst-case complexity. She also provesthat (a small variant) of this algorithmcorresponds to the poly-tree algorithm.

A.2.2 Cut-Set Conditioning. A “cut-set con-ditioning” algorithm also uses the poly-tree algorithm, but in a different man-ner. Given a belief net B=〈N ,A, CP〉with nodes N , and query P (H | E),this algorithm first finds a subset χ ={X 1, . . . , X k}⊂N , of nodes, such that

–BX =B−X , the BN without X , is a poly-tree, and

–P (X | E) is easy to compute

See Suermondt and Cooper [1991]. It thenexploits the equality

P (H | E) =∑Ex

P (H | E, X = Ex)

× P (X = Ex | E)

to answer the query P (H | E). Note thateach summard is easy to compute, asP (X | E) is easy by construction, andP (H | E, X = Ex) is a poly-tree computa-tion. Figure 13 illustrates this construc-tion.

The run-time for this algorithm is ex-ponential in |X | (as it must sum over∏

X∈X |Domain(X) | terms). However, itsspace requirement is linear in |X | ,as it need only maintain the runningtally.

ACKNOWLEDGMENTS

We thank Gary Kuhn for his many helpful comments.



REFERENCES

AAAI. 1992. Workshop on Approximation andAbstraction of Computational Theories. Tech.Rep., AAAI.

ABDELBAR, A. M., AND HEDETNIEMI, S. M. 1998. Ap-proximating MAPs for belief networks is NP-hard and other theorems. Artificial Intelligence102, 21–38.

ALCHOURRON, C. E., G ARDENFORS, P., AND MAKIN-SON, D. 1985. On the logic of theory change:partial meet contraction and revision func-tions. Journal of Symbolic Logic 50, 510–30.

ANDERSEN, S. K., OLESEN, K. G., JENSEN, F. V., AND

JENSEN, F. 1989. HUGIN—a shell for build-ing Bayesian belief universes for expert sys-tems. In Proceedings of the Eleventh Interna-tional Joint Conference on Artificial Intelligence(IJCAI-89), Detroit, Michigan, Aug. 1989. Mor-gan Kaufmann, San Mateo, CA, vol. 2, 1080–1085.

ARNBORG, S. 1985. Efficient algorithms for com-binatorial problems on graphs with boundeddecomposability—a survey. BIT 25, 2–33.

AZEVEDO, P. J. 1997. Magic sets with full sharing.Journal of Logic Programming 30, 3 (March),223–237.

BACCHUS, F., GROVE, A., HALPERN, J., AND KOLLER,D. 1996. From statistical knowledge bases todegrees of belief. Artificial Intelligence 87, 75–143.

BARKER, V. E., O’CONNOR, D. E., BACHANT, J., AND

SOLOWAY, E. 1989. Expert systems for config-uration at Digital: XCON and beyond. Com-munications of the ACM 32, 3 (March), 298–318.

BEINLICH, I. A., SUERMONDT, H. J., CHAVEZ, R. M., AND

COOPER, G. F. 1989. The ALARM monitoringsystem: a case study with two probabilistic in-ference techniques for belief networks. In Pro-ceedings of the Second European Conference onArtificial Intelligence in Medicine, London, Aug.1989. Springer, Berlin.

BOBROW. 1980. Artificial intelligence. Special Issueon Non-Monotonic Logic.

BOROS, E., CRAMA, Y., AND HAMMER, P. 1990.Polynomial-time inference of all valid implica-tions for horn and related formulae. Annals ofMathematics and Artificial Intelligence 1, 21–32.

BRACEWELL, R. N. 1978. The Fourier Transform andIts Applications. McGraw–Hill, New York.

BROWNSTON, L., FARRELL, R., KANT, E., AND MARTIN,N. 1986. Programming Expert Systems in OPS-5. Addison-Wesley, Reading, MA.

CHANG, C.-L. AND LEE, R. C.-T. 1973. Symbolic Logicand Mechanical Theorem Proving. AcademicPress, New York.

CHARNIAK, E. 1991. Bayesian networks withouttears. AI Magazine 12, 50–63.

CHAUDHRI, V. AND GREINER, R. 1992. A formal anal-ysis of solution caching. In Proceedings of theNinth Canadian Conference on Artificial Intel-ligence, Vancouver, 1992.

CHOW, C. K. AND LUI, C. N. 1968. Aproximating dis-crete probability distributions with dependencetrees. IEEE Transactions on Information Theory14, 3, 462–467.

CLARK, K. 1978. Negation as failure. In Logic andData Bases, H. Gallaire and J. Minker Eds.Plenum Press, New York, 293–322.

CLOCKSIN, W. F. AND MELLISH, C. S. 1981. Program-ming in Prolog. Springer, New York.

COOPER, G. 1990. The computational complexityof probabilistic inference using Bayesian beliefnetworks. Artificial Intelligence 42, 2–3, 393–405.

COUSINS, S., CHEN, W., AND FRISSE, M. 1993. A tuto-rial introduction to stochastic simulation algo-rithms for belief networks. Artificial Intelligencein Medicine 5, 315–340.

DAGUM, P. AND LUBY, M. 1993. Approximating prob-abilistic inference in bayesian belief networksis NP-hard. Artificial Intelligence 60, 1 (March),141–153.

DAGUM, P. AND LUBY, M. 1997. An optimal approxi-mation algorithm for bayesian inference. Artifi-cial Intelligence 93, 1–27.

DALAL, M. AND ETHERINGTON, D. W. 1992a. Tractableapproximate deduction using limited vocabu-lary. In Proceedings of the Ninth Canadian Con-ference on Artificial Intelligence, Vancouver, May1992.

DALAL, M. AND ETHERINGTON, D. W. 1992b. A hierar-chy of tractable satisfiability problems. Informa-tion Processing Letters 44, 4 (Dec.), 173–180.

DARWICHE, A. AND PROVAN, G. M. 1996. Query dags: apractical paradigm for implementing belief net-work inference. In Proc. Uncertainty in AI.

DARWICHE, A. AND PROVAN, G. M. 1997. A standardapproach for optimizing belief network inferenceusing query dags. In Proc. Uncertainty in AI.

DECHTER, R. 1996. Topological parameters for time-space tradeoff. In Proc. Uncertainty in AI, SanFrancisco, 1996, E. Horvitz and F. Jensen,Eds. Morgan Kaufmann, San Maeto, CA, 220–227.

DECHTER, R. 1998. Bucket elimination: a unifyingframework for probabilistic inference. In Learn-ing and Inference in Graphical Models.

DECHTER, R. AND RISH, I. 1997. A scheme for ap-proximating probabilistic inference. In Proc.Uncertainty in AI.

DELCHER, A. L., GROVE, A., KASIF, S., AND PEARL, J. 1996.Logarithmic-time updates and queries in prob-abilistic networks. J. Artificial Intelligence Re-search 4, 37–59.

DONINI, F. M., LENZERINI, M., NARDI, D., AND NUTT, W.1997. The complexity of concept languages. In-formation and Computation 134, 1 (10 April), 1–58.



DOWLING, W. F. AND GALLIER, J. H. 1984. Linear timealgorithms for testing the satisfiability of propo-sitional horn formula. Journal of Logic Program-ming 3, 267–84.

DOYLE, J. AND PATIL, R. 1991. Two theses of knowl-edge representation: language restrictions, tax-onomic classification, and the utility of represen-tation services. Artificial Intelligence 48, 3, 261–297.

DRAPER, D. L. AND HANKS, S. 1994. Localized par-tial evaluation of belief networks. In Proceed-ings of the 10th Conference on Uncertaintyin Artificial Intelligence, San Francisco, CA,USA, July 1994, R. L. DE MANTARAS AND D. POOLE,Eds. Morgan Kaufmann, San Maeto, CA, 170–177.

ELLMAN, T. 1993. Abstraction via approximate sym-metry. In IJCAI-93.

ENDERTON, H. B. 1972. A Mathematical Introductionto Logic. Academic Press, New York.

FELLER, W. 1966. An Introduction to ProbabilityTheory and Its Applications II. John Wiley, NewYork.

FINDLER, N. V., Ed. 1979. Associative Networks: Rep-resentation and Use of Knowledge by Computers.Academic Press, New York.

FREUDER, E. C. 1996. Principles and Practice of Con-straint Programming. Springer, New York.

FRIEDMAN, N., GEIGER, D., AND GOLDSZMIDT, M. 1997.Bayesian network classifiers. Machine Learning29, 131–163.

GAREY, M. R. AND JOHNSON, D. S. 1979. Computersand Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, New York.

GENESERETH, M. R. AND NILSSON, N. J. 1987. Logi-cal Foundations of Artificial Intelligence. Mor-gan Kaufmann, Los Altos, CA.

GENESERETH, M. R. AND FIKES, R. E. 1992. Knowl-edge Interchange Format, Version 3.0 ReferenceManual. Tech. Rep. Logic-92-1 (June), ComputerScience Dept., Stanford Univ.

GHAHRAMANI, Z. AND JORDAN, M. I. 1997. Factorialhidden markov models. Machine Learning 29,245.

GINSBERG, M. 1987. Readings in Nonmonotonic Rea-soning. Morgan Kaufmann, Los Altos, CA.

GINSBERG, M. L. 1991. Knowledge Interchange For-mat: The KIF of Death. Tech. Rep., StanfordUniv.

GRATCH, J. AND DEJONG, G. 1996. A decision-theoreticapproach to adaptive problem solving. ArtificialIntelligence 88, 1–2, 365–396.

GREINER, R. 1991. Finding the optimal derivationstrategy in a redundant knowledge base. Artifi-cial Intelligence 50, 1, 95–116.

GREINER, R. 1996. PALO: a probabilistic hill-climbing algorithm. Artificial Intelligence 83, 1–2 (July), 177–204.

GREINER, R. 1999. Explanation-based learning. InThe Encyclopedia of Cognitive Science, R. WILSON

AND F. KEIL, Eds. MIT Press, Cambridge, MA,301–303. A Bradford Book.

GREINER, R. AND ELKAN, C. 1991. Measuring and im-proving the efiectiveness of representations. InProceedings of the Twelfth International JointConference on Artificial Intelligence, Sydney,Australia, Aug. 1991, 518–524.

GREINER, R. AND SCHUURMANS, D. 1992. Learninguseful horn approximations. In Proceedings ofKR-92, San Mateo, CA, Oct. 1992, B. NEBEL,C. RICH, AND W. SWARTOUT, Eds. Morgan Kauf-mann, San Mateo, CA.

GREINER, R., GROVE, A., AND SCHUURMANS, D. 1997.Learning Bayesian nets that perform well. InUncertainty in Artificial Intelligence.

HALPERN, J. Y. 1990. An analysis of first-order logicsof probability. Artificial Intelligence 46, 3 (Dec.),311–350.

HECKERMAN, D. 1991. Probabilistic Similarity Net-works. MIT Press, Cambridge, MA.

HECKERMAN, D. E. 1995. A Tutorial on Learningwith Bayesian Networks. Tech. Rep. MSR-TR-95-06, Microsoft Research.

HERSKOVITS, E. H. AND COOPER, C. 1991. Algorithmsfor Bayesian Belief-network Precomputation. InMethods of Information in Medicine.

HOOS, H. H. AND STUTZLE, T. 1998. Evaluating las ve-gas algorithms—pitfalls and remedies. In Proc.Uncertainty in AI, San Francisco, 1998. MorganKaufmann, San Mateo, CA.

HORVITZ, E. J., SUERMONDT, H. J., AND COOPER, G. F.1989. Bounded conditioning: flexible inferencefor decisions under scarce resources. In Proceed-ings of the Fifth Conference on Uncertainty in Ar-tificial Intelligence (UAI-89), Windsor, Ontario,1989. Morgan Kaufmann, San Mateo, CA, 182–193.

JAAKKOLA, T. S. AND JORDAN, M. I. 1996a. Comput-ing upper and lower bounds on likelihoods in in-tractable networks. In Proc. Uncertainty in AI.

JAAKKOLA, T. S. AND JORDAN, M. I. 1996b. Recursivealgorithms for approximating probabilities ingraphical models. Tech. Rep. 9604 (June), MITComputational Cognitive Science.

JENSEN, F. V., LAURITZEN, S. L., AND OLESEN, K. G. 1990.Bayesian updating in causal probabilistic net-works by local computations. SIAM Journal onComputing 4, 269–282.

KAUTZ, H. AND SELMAN, B. 1996. Pushing the enve-lope: planning, propositional logic and stochasticsearch. In Proceedings AAAI96, Menlo Park,Aug. 1996. AAAI Press/MIT Press, Cambridge,MA, 1194–1201.

KAUTZ, H., KEARNS, M., AND SELMAN, B. 1993. Rea-soning with characteristic models. In AAAI-93,34–39.

KHARDON, R. AND ROTH, D. 1994. Reasoning withmodels. In AAAI-94, 1148–1153.

KJÆRULFF, U. 1994. Reduction of computationalcomplexity in Bayesian networks through



removal of weak dependences. In Uncertainty inArtificial Intelligence: Proceedings of the TenthConference, Seattle, WA, 1994, R. L. DE MANTARAS

AND D. POOLE, Eds.KOLLER, D. AND PFEFFER, A. 1997. Learning proba-

bilities for noisy first-order rules. In Proceedingsof the 15th International Joint Conference on Ar-tificial Intelligence (IJCAI), Nagoya, Japan, Aug.1997.

KOLLER, D., LEVY, A., AND PFEFFER, A. 1997. P-classic:a tractable probabilistic description logic. In Pro-ceedings of the 14th National Conference on Arti-ficial Intelligence (AAAI), Providence, Rhode Is-land, Aug. 1997.

KONDRAK, G. AND VAN BEEK, P. 1997. A the-oretical evaluation of selected backtrackingalgorithms. Artificial Intelligence 89, 365–387.

LAURITZEN, S. AND SPIEGELHALTER, D. J. 1988. Lo-cal computations with probabilities on graphicalstructures and their application to expert sys-tems (with discussion). Journal of the Royal Sta-tistical Society series B 50, 157–224. Reprintedin (Shafer and Pearl 1990).

LEPAR, V. AND SHENOY, P. P. 1998. A comparisonof lauritzen-spiegelhalter, hugin, and shenoy-shafer architectures for computing marginals ofprobability distributions. In Proc. Uncertainty inAI, San Francisco, 1998. Morgan Kaufmann, SanMateo, CA.

LEVESQUE, H. J. 1984. Foundations of a functionalapproach to knowledge representation. ArtificialIntelligence 23, 155–212.

LEVESQUE, H. AND BRACHMAN, R. 1985. A fundamen-tal tradeoff in knowledge representation andreasoning. In Readings in Knowledge Represen-tation, Los Altos, CA, 1985, R. BRACHMAN AND H.LEVESQUE, Eds. Morgan Kaufmann, San Mateo,CA. 41–70.

LI, Z. AND D’AMBROSIO, B. 1994. Efficient inferencein bayes nets as a combinatorial optimizationproblem. International Journal of ApproximateReasoning 11, 1, 55–81.

LIU, C.-L. AND WELLMAN, M. P. 1997. On state-spaceabstraction for anytime evaluation of bayesiannetwork. Sigart Bulletin 7, 2.

MAREK, W. AND TRUSZCZYNSKI, M. 1989. Relatingautoepistemic and default logics. In Proceed-ings of the 1st International Conference onPrinciples of Knowledge Representation andReasoning, Toronto, Canada, May 1989, H.J. L. R. J. BRACHMAN AND R. REITER, Eds.Morgan Kaufmann, San Mateo, CA, 276–288.

MCALLESTER, D. A. 1989. Ontic: A Knowledge Rep-resentation System for Mathematics. MIT Press,Cambridge, MA.

MCCARTHY, J. 1977. Epistemological problems inartificial intelligence. In Proceedings of theFifth International Joint Conference on ArtificialIntelligence (IJCAI-77), Cambridge, MA, Aug.1977. IJCAI.

MCCARTHY, J. 1980. Circumscription—a form ofnon-monotonic reasoning. Artificial Intelligence13, 1–2 (April), 27–39.

MCCUNE, W. AND WOS, L. 1997. Otter—the CADE-13competition incarnations. Journal of AutomatedReasoning 18, 2 (April), 211–220.

MENGSHOEL, O. J. AND WILKINS, D. 1997. Abstractionand aggregation in belief networks. In AAAI97Workshop on Abstractions, Decisions and Uncer-tainty.

MITCHELL, T. M., KELLER, R. M., AND KEDAR-CABELLI,S. T. 1986. Example-based generalization: aunifying view. Machine Learning 1, 1, 47–80.

MITCHELL, T. M. 1997. Machine Learning. McGraw-Hill, New York.

MOORE, R. C. 1982. The role of logic in knowledgerepresentation and commonsense reasoning. InProceedings of the National Conference on Arti-ficial Intelligence, Pittsburgh, PA, Aug. 1982, D.WALTZ, Ed. AAAI Press, Menlo Park, Calif., 428–433.

NAGEL, E. AND NEWMAN, J. 1958. Godel’s Proof.New York University Press, New York.

NECHES, R., FIKES, R., FININ, T., GRUBER, T., PATIL, R.,SENATOR, T., AND SWARTON, W. R. 1991. Enablingtechnology for knowledge sharing. AI Magazine,37–51.

NGO, L. AND HADDAWY, P. 1995. Probabilistic logicprogramming and Bayesian networks. LectureNotes in Computer Science 1023, 286–300.

NILSSON, N. J. 1986. Probabilistic logic. Artificial In-telligence 28, 1 (Feb.), 71–88.

PEARL, J. 1988. Probabilistic Reasoning in Intel-ligent Systems: Networks of Plausible Infer-ence. Morgan Kaufmann, San Mateo, CA.

PELLETIER, F. J. 1986. Thinker. In Proceedings ofthe 8th International Conference on AutomatedDeduction, Oxford, UK, July 1986, J. H. SIEK-MANN, Ed. vol. 230 of LNCS. Springer, Berlin,701–702.

POOLE, D. 1993a. Average-case analysis of a searchalgorithm for estimating prior probabilities inBayesian networks with extreme probabilities.In Proceedings IJCAI-93, Chamberery, France,Aug. 1993, 606–612.

POOLE, D. 1993b. Probabilistic horn abduction andbayesian networks. Artificial Intelligence 64, 81–129.

POOLE, D., GOEBEL, R., AND ALELIUNAS, R. 1987.Theorist: a logical reasoning system for defaultand diagnosis. In The Knowledge Frontier: Es-says in the Representation of Knowledge, NewYork, 1987, N. CERCONE AND G. MCCALLA, Eds.Springer, Berlin, 331–52.

POOLE, D., MACKWORTH, A., AND GOEBEL, R. 1998.Computational Intelligence: A Logical Approach.Oxford.

PRADHAN, M. AND DAGUM, P. 1996. Optimal MonteCarlo estimation of belief network inference.In Proceedings of the 12th Conference on Un-certainty in Artificial Intelligence (UAI-96), San



Francisco, Aug. 1996, E. HORVITZ AND F. JENSEN,Eds. Morgan Kaufmann, San Mateo, CA. 446–453.

PRADHAN, M., PROVAN, G., MIDDLETON, B., AND HENRION,M. 1994. Knowledge engineering for large be-lief networks. In Proceedings of the 10th Con-ference on Uncertainty in Artificial Intelligence,San Francisco, CA, USA, July 1994, R. L. DE MAN-TARAS AND D. POOLE, Eds. Morgan Kaufmann, SanMateo, CA, 484–490.

RABINER, L. R. AND JUANG, B. H. 1986. An introduc-tion to hidden Markov models. IEEE ASSP Mag-azine, 4ff.

REITER, R. 1978a. Deductive question-answering onrelational data bases. In Logic and Data Bases,H. GALLAIRE AND J. MINKER, Eds. Plenum Press,New York, 149–177.

REITER, R. 1978b. On closed world data bases. InLogic and Data Bases, H. GALLAIRE AND J. MINKER

Eds. Plenum Press, New York, 55–76.REITER, R. 1980. Equality and domain closure in

first-order databases. Journal of the Associationfor Computing Machinery 27, 2 (April), 235–49.

REITER, R. 1987. Nonmonotonic reasoning. In An-nual Review of Computing Sciences, vol. 2, An-nual Reviews, Palo Alto, 147–187.

ROBINSON, J. A. 1965. A machine-oriented logicbased on the resolution principles. JACM 12, 23–41.

ROTH, D. 1996. On the hardness of approximatereasoning. Artificial Intelligence Journal 82, 1–2(April), 273–302.

RUDNICKI, P. 1992. An overview of the Mizar project.In Notes to a Talk at the Workshop on Types forProofs and Programs.

SACERDOTI, E. D. 1973. Planning in a hierarchy ofabstraction spaces. In IJCAI-73.

SARKAR, S. AND MURTHY, I. 1996. Constructing ef-ficient belief network structures with expertprovided information. IEEE Transactions onKnowledge and Data Engineering 8, 1 (Feb.),134–143.

SCOTT, A. C., CLAYTON, J. E., AND GIBSON, E. L. 1991.A Practical Guide to Knowledge Acquisition.Addison-Wesley, Reading, MA.

SELMAN, B. AND KAUTZ, H. 1996. Knowledge compi-lation and theory approximation. Journal of theACM 43, 193–224.

SELMAN, B., LEVESQUE, H., AND MITCHELL, D. 1992.A new method for solving hard satisfiabilityproblems. In Proceedings of the Twelfth NationalConference on Artificial Intelligence, San Jose,July 1992, 440–446.

SHAFER, G. AND PEARL, J. 1990. Readings in Uncer-tain Reasoning. Morgan Kaufmann, Los Altos,CA.

SINGH, M. 1998. Learning Belief Networks. Ph.D.thesis, Dept. of Computer Science, Univ. of Penn-sylvania.

SMITH, D. E. AND GENESERETH, M. R. 1985. Orderingconjunctive queries. Artificial Intelligence 26, 2(May), 171–215.

SMITH, D. E., GENESERETH, M. R., AND GINSBERG, M.L. 1986. Controlling recursive inference. Arti-ficial Intelligence 30, 3, 343–389.

SMYTH, P., HECKERMAN, D., AND JORDAN, M. I. 1997.Probabilistic independence networks for hiddenmarkov probability models. Neural Computation9, 2, 227–269.

SRINIVAS, S. 1994. A probabilistic apprach to hier-archical model-based diagnosis. Tech. Rep. Re-search Rep. no. KSL-94-14 (Feb.), KnowledgeSystems Laboratory, Stanford Univ.

SUERMONDT, H. J. AND COOPER, G. F. 1991. Initializa-tion for the method of conditioning in bayesianbelief networks (research note). Artificial Intel-ligence 50, 1 (June), 83–94.

SWAMI, A. AND GUPTA, A. 1988. Optimization of largejoin queries. SIGMOD Record (ACM Special In-terest Group on Management of Data) 17, 3(Sept.), 8–17.

TOWELL, G. G. AND SHAVLIK, J. W. 1993. Ex-tracting refined rules from knowledge-basedneural networks. Machine Learning 13, 71–101.

TREITEL, R. J. AND GENESERETH, M. R. 1987. Choosingorders for rules. Journal of Automated Reason-ing 3, 4 (Dec.), 395–432.

TURING, A. M. 1936. On computable numbers, withan application to the Entscheidungs problem.Proc. London Math. Soc. 2, 42, 230–265. Correc-tions on vol. 2, 43, 544–546.

VALIANT, L. G. 1979. The complexity of enumerationand reliability problems. SIAM Journal on Com-puting 8, 3, 410–421.

VAN DER LANS, R. F. 1989. The SQL Standard: AComplete Reference. Prentice Hall, EnglewoodCliffs, NJ. Translated by Andrea Gray.

VAN ENGELEN, R. A. 1997. Approximating Bayesianbelief networks by arc removal. IEEE PAMI 19,9 (Aug.), 916–920.

WELLMAN, M. P. 1990. Fundamental concepts ofqualitative probablistic networks. Artificial In-telligence 44, 3, 257–303.

WELLMAN, M. 1994. Abstraction in ProbabilisticReasoning. Tech. Rep., Univ. of Michigan.Tutorial prepared for the Summer Institute onProbability in AI, Corvallis, OR, July 1994. Seehttp://ai.eecs.umich.edu/people/wellman/tut/Abstraction.html.

WELD, D. S. AND DE KLEER, J. 1990. Readings in Qual-itative Reasoning and Physical Systems. MorganKaufmann, Los Altos, CA.

WEBB, G. I., WELLS, J., AND ZHENG, Z. 1999. An experi-mental evaluation of integrating machine learn-ing with knoweldge acquisition. Machine Learn-ing 35, 5–21.

ZHANG, N. L. AND POOLE, D. 1994. A simple ap-proach to Bayesian network computations. In



Proc. of the 10th Canadian Conference on Arti-ficial Intelligence, Banff, Alberta, Canada, May1994.

ZILBERSTEIN, S. 1993. Operational Rationalitythrough Compilation of Anytime Algorithms.

Tech. Rep. CSD-93-743, Univ. of California,Berkeley.

ZILBERSTEIN, S. 1996. Resource-bounded reasoningin intelligent systems. ACM Computing Surveys28, 4 (Dec.), 15.

Received December 1998; revised June 1999; accepted March 2000


Technology

Efficient reasoning