What makes RABBIT run?

Int. J. Man-Machine Studies (1984) 21, 333-352

What makes RABBIT run?t

MICHAEL DAVID WILLIAMS

Applied Artificial Intelligence, IntelliGenetics, Menlo Park, California 94025, U.S.A.

Once one completes the design and construction of a novel interface that claims to embody a new interface paradigm one is confronted with two problems: (1) Is the interface better than what exists? and (2) If it is better, what makes it better? The answer to the second question is important if one has any hope of extending the paradigm. Also, paradoxically, it is important to have the answer to the second question to be able to address the first. This is because new paradigms often derive their value from introducing new functionality in the system and thus highlight criteria not previously recognized. This is exactly the case with a novel database retrieval interface we have constructed called RABBIT. This article describes some recent work on understanding where the apparent power of the interface comes from.

Introduction

RABBIT is based on a not ion o f retrieval by reformulation and serves to provide a user interface to aid users in formulat ing a query, (Williams & Tou, 1982; Tou et al., 1982; Tou 8~ Williams, 1983; Tou, 1982). The paradigm of retrieval by reformulation was derived f rom a psychological theory o f h u m a n remembering (Williams & Hollan, 1981 ; Williams, 1981; N o r m a n & Bobrow, 1979). To make a query in RABBIT, the user interactively refines partial descriptions o f his target item(s) by criticizing successive example (and counter-example) instances that satisfy the current partial description. The example instance and query are presented in "active fo rm" windows on a high resolution bit map display. Critiquing opt ions (such as " requi re" or "prohib i t " a certain value) are presented to the user by means o f a pop-up menu as the user "poin ts" at parts o f the instance o f interest. Each critiquing action results in RABBIT ' s reformulat- ing the query and displaying the new query to the user. Instances from the database are presented to the user f rom a perspective inferred from the user 's query description and the structure o f the knowledge base. A m o n g other things, this constructed perspective reminds users o f likely terms to use in their descriptions, enhances their unders tand- ing o f the meaning o f the given terms, and prevents them from creating certain classes o f semantically improper query descriptions. RABBIT part icularly facilitates users who approach the database with only a vague idea o f what they want and therefore need to be guided in the (re)formulat ion o f their queries. R A B B I T is also o f substantial value to casual users who have limited knowledge of a given database or who must deal with a mult i tude o f databases.

An example RABBIT interaction

This section describes a typical R A B B I T I I I interaction. R A B B I T I I I is p rogrammed in Interl isp-D running on the Xerox 1100 series machines and makes extensive use o f

t The bulk of the work reported in this paper was accomplished while the author was in the Cognitive and Instructional Sciences Group at Xerox, Palo Alto Research Center.

333

0020-7373/84/100333 +20503.00/0 �9 1984 Academic Press Inc. (London) Limited

3 3 4 M . D . W I L L I A M S

high resolution bit map graphics and a pointing device called a "mouse" . As it now stands RABBIT is a "p roo f of concept" program running over a small database. The database is represented in a semantic network (a simplification of KL-ONE). The implementat ion of KL-ONE we are using is written in such a way that interfacing a traditional relational database underneath the semantic network would be a compara- tively straightforward task (defining eight functions that make tuples from various relations look like K L - O N E instances). In such a configuration one can conceive of the semantic network as providing many of the services traditionally carried out by a data dictionary. In the long term the vision is to have RABBIT as a query assistant in a local machine with a local KL-ONE conceptual structure transforming the user's queries into acceptable forms to query remote databases. What are now KL-ONE instances will be the data objects of these remote databases.

Figure 1 shows RABBIT in the midst of a retrieval interaction. The interface consists of a constellation of three windows. The one in the upper right hand comer (The Query: "Personnel-search") contains the query, a description of the objects(s) the user is trying to find. The boldface terms are categories into which the item falls. The

,...,.,,,,.,

!ii!i!ililil : , : , : . : ,1 . : , :!:i:i:i:!:i i:!:i:i:i:i: :i:!:i:i:i:i ;,;,:.:,:,:,

iiiiiiiiiiii . . , . , . . . , .

jobTi t le StoreM~n~ger ~-m.~lO!ler mandy mgn~:Zcr RachelK.ach s,ilcty :32QOQ .~,gci,r ~Secuzit.V.,Vnm,Ser 411278892

Manager . . . . :SteveSmith

II J~ Phil.Jones 3 h i l S i e f e r t j!~hyllisFormark q e . c h e l K o c h

!, T o r n T u c k e r JohnBrown

/1/ . , . . . , . , v . . .

iiii!iiiiiiiii !:i:i:i:i:i:i: iii!iii!iiiii! . . . v , , . . , . . , . : , : . : . : , : , : , :

iiiiiiiiiilili !if@ , , . , v . , . , , . , . . . , . , . , . , N . x , x . :

FIG. 1. Sample RABBIT display.

WHAT MAKES RABBIT RUN? 335

indented italicized terms are roles or fields (e.g. age, employer) on which the user has commented. The terms indented and beneath the roles are either specific values the role should satisfy, predicates (e.g. >21) the acceptable values must satisfy, or embedded descriptions that recursively describe the acceptable values. Thus, the target item is some sort of person who happens to be an employee, with an age >21, who works for some corporation incorporated in either California or Delaware. The window in the lower left-hand corner ("James Tancon") contains an example instance from the database described from the perspective of being a person and an employee. The menu next to the instance window ("14 matches") contains a scrollable list of items from the database that satisfy the query. The principal form of interaction the user has with RABBIT is to point at various parts of the instance description and critique them. Figure 1 shows RABBIT in the midst of just such a critique with a pop-up menu of critique options (require, prohibit, alternatives, describe, predicate) that come up when the user points to a part of the instance.

The sample database we are running RABBIT over consists of a pot pourri of concepts: products, people, businesses, locations, recipes, and miscellaneous others. It is a ficticious database. Its information has been fabricated primarily to maximize the heterogeneity and interconnectedness of the database.

A user initiates a query in RABBIT by announcing to a notebook facility (not described in this article) that he wishes to create a new query. The system responds by asking the user to type in a basic category. If the term is recognized, RABBIT constructs an initial query and presents an instance that satisfies the description along with a browsing menu of matching instances.? For example, if the user were to type "Product" the resulting query window would look much like the display in Fig. 2. Note that even in the restricted experimental database we are using, the system knows far more about the Apple-II-plus than just the cost and manufacturer (this database "knows" about 30 attributes of an Apple-II-plus).

RABBIT is presenting this item only from the perspective of being a Product. The dynamic construction of perspectives will be discussed more fully below. RABBIT also presents a list of alternative matching instances in a scrollable menu to the right of the instance window. A new instance can be brought up by pointing to the instance name in the list of matching instances.

The basic method a user employs to refine a query is to point at various properties of the instance presented and critique them. To do so the user uses a pointing device (a mouse on our machines) to select some aspect of the instance presented. RABBIT reacts by presenting a menu of critique commands appropriate to the aspect selected. This use of "act ive forms", where segments of text or icons have a behavior that can be directed by pointing at the object or its parts, is common in Interlisp-D and Smalltalk. There are six critiques available.

REQUIRE: adds the attribute to the query. PROHIBIT: applies the relative complement of the attribute to the query. ALTERNATIVES: suggests a series of alternative values and inserts those selected

into the query. DESCRIBE: allows the recursive specification of an embedded query.

t If the term is not recognized, RABBIT creates a default query based on the sought after object necessarily being an Entity. All instances in RABBIT are Entities.

336 M . D . W I L L I A M S

ENTITY Product

ENTI•/•f REQUIRE PROHIBIT SPECIALIZE ALTERNATIVES

acme String Product

cost 1 4 0 0 ~tan~f r Apple ComputerRelatedProduct OFM-Product RetailProduct

Apole-II-plus Atarl-400 Atari-800 Crornernco-Z-80 5tar-8011 Star-8012 5802 3000 5502 58000 3085a Z-80 Z.8000 • iAtari.8~0 !Datatrak-8 Shugart.5 T300-disk Xerox-29 Concept-100 Large-F orrnat-Displ~ TV

FIG. 2. Initial "Product" query.

SPECIALIZE: refines the generic category by allowing a user to select one or more subconcepts.

PREDICATE: allows the user to apply a predicate to the value of an attribute (e.g. > , < ) .

Not all critiques are applicable to all attributes of an example. RABBIT controls this by limiting the critique functions presented in the pop-up menu to those acceptable to the attribute the user has indicated.t The pop-up menu in Fig. 1 has only five of the possible six critique function because it was generated by selecting one of the roles or fields of the example instance presented (vice selecting one of the boldface conceptual categories). Selecting REQUIRE causes RABBIT to reformulate the user's initial query adding the information that the items sought are indeed computer related produces (see Fig. 3). This is the basic cycle of reformulation, each critique results in RABBIT incrementally refining the query.

RABBIT does have a limited understanding of the semantics of a query. For example, if we were to point to "computer related product" and this time select the critique

t Th i s is a c o m m o n aspect o f the use o f "active forms".


ENTITY Product ComputerRelatedProduct

ENTITY name String

Product cost 1400 JREQU IRE mantZ~f ect~ztet ~ H I IT ComputerRelatedProd PRO B SP_CIALIZE OEM-Product IALTERNATIVES] RetailProduct

Apple-ll-plus Atari-400 Atari-SOO Cromemco. Z.80 Star.8011 Star.8012 6802 8000 6502 68000 8085a Z.80 Z-8000 • Atari-810 Datatrak-8 Shugart.5 T300-disk Xerox-29 Concept.100 Large-Format Diapl~ TV

FIG. 3. REQUIRE ComputerRelatedProduct.

PROHIBIT the reformulation results in the query shown in Fig. 4. RABBIT is smart enough to realize that no item can be both a computer-related-product and not-a- computer-related-product. It presumes that the last thing the user said must be what was intended and modifies the query accordingly, in this case by replacing the conceptual category "computer related product" with its relative complement "not-computer related product".

On occasion one wants to say something more precise than one of the conceptual categories given. For example, we might want to specify what kind of computer-related- product we have in mind. In this case, we select the SPECIALIZE command. RABBIT generates a menu o f kinds of computer-related-products it knows about. We select "computer" and the query is again reformulated (Fig. 5, again correcting the semantics of the query; any computer must be a computer-related-product in RABBIT's model of the world.

Selecting an attribute such as "manufacturer" we get a slightly different set o f critique commands: REQUIRE, PROHIBIT, ALTERNATIVES, DESCRIBE, and PREI~I- CATE. In this case we do not want to simply require or prohibit Apple as the

338 M . D . WILLIAMS

: E N T I T Y P r o d u c t N O T - - C o m p u t e r R e l a t e d P r o d u c t

ENTITY n ~ r a e - String

Product cost 1 400 rar Apple Comput erRelat edProduct OEM -Product RetailProduct

Apple-ll.plus ~ tarl.400 Atari-800 3romemco-Z.80 3tar-8011 !iStar-8012 3802 REQUIRE~ PROHIBIT SPECIALIZE ALTERNATIVES/

x.10 I I IAtar i -810 IIDatatrak-8 | lShugart-5 mlT300-0isk mlxerox.29 mJconcept-10o IILarge-Format-Oispl~ I I T v . . . .

FIG. 4. Maintaining semantic consistency.

manufacturer, but would rather specify some alternative values. Execution of the ALTERNATIVES command results in a menu of alternative values from which we select several resulting in the query shown in Fig. 6.t The description now specifies a computer built by Apple, Xerox, or Atari.

The " P R E D I C A T E " critique allows the application of simple predicates to values [e.g. > , < , = t o numerical values, s a m e - a s (i.e. linking variable) to any ent i ty , . . . ] . Thus, through a cascading menu scheme similar to those described above supplemented by a type-in capability we might specify we want a computer costing less than $20,000 as presented in Fig. 7. At this point, suppose we have run out of useful critiques with the instance described as it is. We can get a new instance description (though not necessarily a new instance) by pointing to the query and commanding a new retrieval. The result is a list of new matching instances along with the same instance presented from an extended perspective (Fig. 8).~: The interaction can now proceed with the user critiquing the new attributes in a manner similar to that described above.

t For brevity, the interactions described in this article often leave out steps. For example, in certain cases, the alternatives command actually steps through another menu specifying a set operator (and, or, etc).

~: If the descript ion/query had exluded Apple-II we would have achieved a new instance.

W H A T M A K E S R A B B I T R U N ? 339

ENTITY Product

ComputerRel a t e d P r o d u c t Computer

Computer

~ CPU Disk Display

n~m~, String Product

cost 1 4 0 0 racznuf acturet A p p l e ComputerRelatedProduct OEM-Product RetailPrnduct

J•t•L[•- tl-plu s

REQUIRE 40O PROHIBIT 800 SPECIALIZE emco-Z-80 ALTERNATIVES 3011

Star-8012 6802 8000 6502 68000 8085a Z-80 Z-80O0 Xerox-10 Atari-810 Datatrak-8 Shugart .5 T300-disk Xerox-29 Concept-100 Large-Format-Displ~ i T v . . . . . I

!i!iiiii!i!i!ii

iiiii!iiiil

FIG. 5. SPECIALIZE to computer.

RABBIT is presenting new information about the same instance (an Apple-II-plus) because the user is committed to talking about a computer. RABBIT's knowledge base recognizes that as a computer an Apple-II-plus has an operating system, a cpu, a disk (in this case unspecified, therefore the generic descriptor "Disk" fills the slot), memory, and a display. In the prior presentation the user had only asked for a product, therefore Apple-II was described as a product, having a cost and manufacturer. We refer to this technique of presenting only attributes associated with the generic categories the user is committed to in his query as "filtering".

One major limitation of alternatives as a critiquing command is that it provides an extensional expression of alternative values to fill the attribute specified. There are two practical problems with this: the list o f alternatives might exceed any reasonable menu presentation limits (there might be 10,000 alternative values), the user might not even understand some or all o f the terms listed. Consider, for example, if we ask for alternatives for filling the disk attribute. The alternatives menu might look as in Fig. 9 listing Atari-810, Xerox-10, and Xerox-29, none of which the user understands. One solution to these problems is for the user to ask to describe intentionally the value o f the attribute. RABBIT reacts to this critique command with the implicit observation

340 M . D . W I L L I A M S

iiiiiiiii

ENTITY Product

meaaf~ctnrer OR Apple A t a r i X e r o x

ComputerRel atedProduct Computer iii!iiiiiiiii

ENTITY ~ e String

Product cost ~1 40 0 raenuf ec~urer DESCRIBE ComputerRelatedPro PREDICATE OEM-Product RetailProduct

Crornernco

Intel Motorola

Zialog Shugart Tandy Son~

V Star-8012 6802 8000 8502 68000 8085a Z-8O Z-8000 Xerox-10 Atari-810 Datatrak-8 Shugart.5 T300-disk Xerox-29 Concept-100

FIG. 6. A L T E R N A T I V E S to manufacturer.

iii iiiiiiii

that the production of an intentional description o f a disk is just a recursion on the whole query process underway. It presents an "embedded" RABBIT query window set initializing the query with a default perspective inferred from the restriction that the attribute must be filled with some entity that is necessarily a "Disk" (see Fig. 10).

The reformulation o f this embedded query proceeds in the same manner as that described above. When the specification is complete, the user returns to the original query interaction and RABBIT embeds the results o f the description of the disk into a reformulation of the basic query. This reformulation also results in a modification of the perspective from which the instance is presented. In particular, because the user has expressed an interest in the average-access-time and the capacity of the disk, from now on RABBIT displays, not only is the name o f the disk associated with the computer retrieved, but also the average-access-time and the capacity (see Fig. 11). We call this technique "compression".

ANALYSIS Q U E S T I O N S

As users become more experienced with a database and query system their information demands increase. As a result one can afford to increase the complexity of the system


ENTITY Product

OR A t a r i Apple Xerox

cost (< 20000)

ComputerRelatedProduct Computer

!i!i!iii!i

REQUIRE PROHIBIT ALTERNATIVES Apple-II plus

i n a m e String DESCRIBE Atari-400 Product PREDICATE Atari-800

cost 1 4 0 0 Cromernco-Z-80 ra~R~act~re~ A p p l e St&r-8011 ComputerRel&tedProduct Star-8012 OEM-Product 6802 RetailProduct 8 O0 0

6502 58000 3085a

.,.,.,.,.,. Z-80 Z-8000 Xerox-10 Atari-810 Datatrak-8 Shugart-5 T300-disk

::::::| I IXerox-29 i!iiii ! I l concept -100 ii::iiiiiii:: L arg e- F o rm at. Dis p l~

i~iiiiiEil

FIG. 7. PREDICATE (select < ) to cost .

to extend the expressiveness. In the form described above RABBIT is facile at allowing a broad class of user queries, in particular, any question that can be characterized as a description of a collection of objects. In addition, because of its inherent browsing motif, RABBIT also supports "questions" of the form "What is the capacity of the Atari-810 disk?". One class of questions that RABBIT (as described above) does not support are those I call analysis questions. I take analysis questions to be those that contrast and compare a collection of objects across some dimensions. The simplest form of such a comparison would be to form a table of the matching items displaying the values of each along selected attributes. RABBIT III has a series of functions available to the experienced user to carry out such "analysis" tasks.

The analysis functions are obtained by pointing at the list of matching examples title frame, buttoning the mouse, and selecting a function to apply. Only applicable functions are made available. For example, the "map analysis" function appears only if the objects in question have latitude and longitude attributes within the RABBIT perspective applicable at the time. Examples of analysis functions are the following.

342 M.D. WILLIAMS

Tiiiiiiiiii!iiiiiiiiiii!iiiiiiiTiiiiiiiii

I ~dll II II m,'l Product

OR

Atari Apple Xerox

c o s t (< 20000)

Computer Rel atedProduct Computer

:i:i:i:i:i: iii!iiii;ii

i!i~i~i!!!i

ENTITY String

Product cos~ 1 4 0 0 r r ~ n u f ~ r c t e r e r A i o p I e D E M - P r o d u c t RctailProduct

ComputerRelatedProduct Computer

d#sk Disk a'is/~l~r y T V raemozy A p p l e - 4 8 K CPU 6 8 0 2 opet~ztingSys%er,~ A p p l e S o f t Computer/{3EM-Product/RetailProduct

Apple II-plus Atar i -400 Atar i -800 Star -8011 S ta r -8012

i:i:i:i:i:i

:i8i8i8 ii!iii!iiiiii

:.:<.;+:.

iiiiUiiiii

F I G . 8. N e w R E T R I E V E on query.

Basic table: prompts a user to point at attributes o f interest in the instance presented, then presents a table of all the matching examples that satisfy the present query along with their values in the selected dimensions.

Comparison table: similar to basic table except that it takes the displayed instance as a base line and compares each other matching example for similarity along the selected dimensions. Figure 12 shows such a comparison over the computers described in our example query. The attributes that are "similar" are highlighted (reverse videoed) in the table. The matching examples are sorted such that the most similar are displayed nearest the displayed instance. The similarity function in this case is a simple test o f equality, though one can imagine arbitrarily complex functions to satisfy whatever theory o f "similarity" one might wish to exploit.

Scatter plot: prompts user to select two attributes that can be interpreted numerically and builds a scatter plot of the matching instances along those dimensions.

Map analysis: generates a high resolution map of the piece of the world that contains the matching instances, displays the coastline o f this piece of the world along with the matching instances. The coastline information is drawn from a world map database.t

t T h i s is done by intergrating RABBIT with a substantial piece of software called the Geographic Information System built by Richard Burton and in local use at Xerox P A R C .


FIG. 9. A L T E R N A T I V E S to disk.

Path grapher: displays partial network diagrams of the relationships among instances in the database. The displayed instance is taken as the starting instance and the relationships are specified by the user as a sequence of KL-ONE roles (or path). Thus, one might follow the employees role from a vice-president to generate an organization chart of that person's organization.

The functions above are intended to be illustrative of some broad classes of analysis and presentation. For example, one can readily imagine a series of plotting functions that create histograms, pie charts, labeled and unlabeled scatter plots, etc.

One interpretation of what these analysis functions do is that they give a user a view or slice through a segment of the database. They allow the user to examine the aggregate properties of substantive portions of the database directly in a variety of visual formats. These kinds of capabilities have traditionally been bound up in what the database management industry calls "report generators". One of the principal vectors in the planned extensions of RABBIT is to integrate this analysis activity with querying and browsing much as early versions of RABBIT have integrated querying and browsing into a unified activity. For example, users ought to be able to locate a point in the scatter plot and have the instance represented there to be displayed in the instance window, or specify a region from an analysis map to reformulate the query.

344 M. D. W I L L I A M S

FIG. 10. DESCRIBE to disk.

EXEMPLAR SEARCH

One potential factor limiting the effectiveness of RABBIT-like interfaces is that they make extensive use of intermediate retrieval cycles. I f it were to take minutes to retrieve examples satisfying the reformulated query, then a RABBIT-like system would be ponderous at best. Just as the original design of RABBIT was based on an imitation of human retrieval processes (Williams & Tou, 1982; Williams, 1981), we have returned to observational studies to find a fix to this problem. What we found was the extensive use of exemplars as intermediate instances in the retrieval process (Smith & Medin, 1981).

In exemplar search RABBIT has a slaort list o f exemplars pre-stored with each generic concept in the semantic network. When a retrieval cycle is initiated RABBIT first examines these local exemplars. I f any satisfy the user 's description, it is displayed as the example in the instance window and the regular retrieval to find all of the matching examples in the database is started up as a background process. As a result, the user can continue reformulation of the query while RABBIT is carrying out one of its intermediate retrievals.


FIG. 1 t. Compress ion in dynamic perspectives.

Exemplar search is only one example of the RABBIT paradigm's capacity to exploit "find one" search techniques. Retrieval algorithms for such techniques are known to be substantially faster than the more general "find all" techniques.

Retrieval by reformulation

It is important to recognize that there are a variety of levels at which one can conduct interface design. The design in RABBIT focuses on the conceptual interaction between the user and the machine as opposed to the details of menu regimes (e.g. cascade, pop-up versus fixed, choice of command names, graphic esthetics, etc. Further, notice that RABBIT is addressed toward interface design for a specific task, information browsing, exploration, and retrieval. Though there are implications of the RABBIT design that transcend this task domain, little attempt has been made to establish design principles that apply across applications like word processing, computer programming, or device operation. In addition, certain characteristics of the RABBIT design are taken for granted though they have occurred in a number of other systems. In particular,


FIG. 12. Comparison table analysis.

the basic paradigm of active forms, in which objects, forms and parts of forms carry

their own behavior that can be elicited by poin t ing at them is widely used through

out ln ter l isp-D, Smalltalk, and other display or iented software environments . The essential characteristics of RABBIT are embodied in its use of the parad igm of

retrieval by reformulation. The fundamenta l characteristics of retrieval by reformula t ion

are the following.

Retrieval by reconstructed descriptions. The user makes a query by describing the object he is seeking (as opposed to network browsing schemes such as Z O G and PDB where the user finds things by traversing a simple network of connect ions from object

to object). Interactive construct ion of queries. In some sense there is no query language as is

the case with the t radi t ional relat ional query languages such as SQUARE, SQL or QBE. t A user does not have to compose a precise query before using RABBIT, rather the query is created in an in teract ion between the user and RABBIT.

t of course, there is one in the sense of the Boolean expression scheme that underlies the constructed description in the query window. Notice that the user does not have to spend several hours learning this Boolean expression scheme. All he has to learn is to differentiate the six commands presented to him any time he tries to critique a portion of the instance presented. Basic UNDO facilities even permit an almost playful exploration of the implications of each of these commands.


Critique of example instances. This is the core of the retrieval by reformulation paradigm, the six critique commands (REQUIRE, PROHIBIT, etc.). This aspect of RABBIT results in the user getting a Template for the type of object he wants to describe, a vocabulary he can be certain that RABBIT understands, and access to additional information and terminology within the database (e.g. through the alternatives and describe commands).

Dynamic perspectives. Perspectives serve four main functions in the RABBIT interface: they are used to control the amount of information presented to the user, they enhance semantic resolution, they enforce a certain class of semantic consistency and they facilitate the use of highly interconnected non-uniformly structured databases.

One last comment on level of design: the particular commands (REQUIRE, PRO- HIBIT, etc.), the techniques used in control of dynamic perspectives (filtering, compression, embedded defaults, etc.), and details of information presentation, are all particular heuristic solutions within the paradigm of retrieval by reformulation. For example, if one were to extend the critique commands or to carry out a massive substitution of those used in RABBIT III, I would consider it a reasonable variation of the basic paradigm. The critique commands are a sub-design problem of attempting to provide a compact set of critiques to maximize query expressiveness while meeting some minimum level of comprehensibility to casual users. For example, PROHIBIT is not the logical not but rather a relative complement where the set complemented is understood by RABBIT from the context of the query under construction. Traditional quantification is handled in the application of set operators (not shown in the example above) or defaulted by RABBIT. These commitments are tactical in that they must be made to build a system and reasonable choices must be made for the system to work. However, they (in and of themselves) are not retrieval by reformulation.

The paradigm of retrieval by reformulation was generated from certain psychological theories of human remembering (Williams & Hollan, 1981; Williams, 1981; Norman & Bobrow, 1979) and extended, in part, based on research from cognitive science (Rissland, 1980; Smith &Medin, 1981 ; Kolodner, 1980; Bower, Black & Turner, 1979; Shank, 1980; Williams et al., 1982).

A couple of RABBIT stories

Up to this point the focus of this article has been the description of the behavior of RABBIT and the paradigm of Retrieval by Reformulation. This description still leaves unanswered our initial question: What is the work that RABBIT is doing that makes it any better than any other database interface? To answer this question we need to know both what work RABBIT is doing for the user as well as how that work is done.

The answer (or answers) to this question are essential both to guide any efforts to extend the system as well as informing us about what characteristics can be eliminated to streamline the system (in hope of building interfaces on something less than a $30,000 personal workstation).

Perhaps because RABBIT was designed by imitating the concept of retrieval by reformulation from human remembering, and because there are a variety of existing paradigms for database interfaces, (e.g. natural language (plane, intellect), table driven queries (QBE), hierarchical menus (viewdata)), there is no single answer to our question.


Rather, over the last couple years, a collection of 10 "stories" have evolved, each capturing a different aspect of the critical work being done.

Due to space limitations, I have limited myself to brief presentations of three "stories". The stories were selected to indicate some of the diversity in the accounts available.

DOMAIN OF QUERIES

One common criterion for the power of a query interface has been the expressibility of the query language. That is to say, what the domain of queries expressible in the language/interface is. This, in part, is what accounts for the success and interest in relational query languages. In particular, once I can map a given query language onto the relational calculus/algebra I can guarantee a certain minimum level of expressiveness. Often a bit more expressibility is required beyond the relational calculus. This is generally handled by showing that a given interface can permit you to express certain classic benchmarks, for example, presuming the appropriate database, the query that retrieves all the employees can make more than their managers. The difficult part of this query is that it requires both an important class of numeric predication (inequality) and a facility for expressing and retrieving on what are known as "linking variables". The common feeling seems to be that if I can prove I have achieved an equivalence with the relational calculus and can handle certain of these classic benchmarks, then I have saturated my need for expressiveness (until, perhaps, you come up with a new benchmark that we can all agree is important). In effect I have achieved a theoretical limit.

This view confuses theoretical limits of expressiveness with practical limits. Consider, for example, the limits on the numbers of fields or attributes in any given relation (or object). There is no formal limit. There is no theoretical reason why I cannot create a relation with 1000 fields (indeed, some back end database machines provide for such a possibility). Yet, with the exception of certain statistical databases that generally yield sparse matrices, this is just not done. Not because one would not like to. I f I wanted to have a database that described a person I might well want to put in thousands of fields (consider military service records that commonly contain dozens of pages of forms, or a health record). The problem is that no one has an effective interface for such relations. How could I show you even one instance (tuple or record)? Even with a scrolling screen you could spend all day surveying the information. As a practical result relations in common databases have from five to 20 fields.

A second practical limitation is the number of relations (or types of objects). Again, there is no theoretical limit on the number of relations permissible for any given database. Nevertheless relational databases commonly contain a few tens of relations (e.g. 10-50 or so). The reason for this practical limit is that database interfaces require the user to produce the name of the relation he wants to refer to in the query. Generally, this means that the user has to have the names of the relations memorized. The addition of each relation requires more and more knowledge on the part of the user. Again the problem is not so much wanting to have databases with thousands of relations, rather the problem is the interface.

As a consequence of these pragmatic limits relational databases tend to consist of a few tens of relations each with a few tens of fields and thousands of tuples or instances. Relational databases (with traditional interfaces) are effective with large numbers of a few tvves of simnlv-described instances.

WHAT MAKES RABBIT RUN'?. 349

RABBIT offers a partial solution to both of these problems. The construction of dynamic perspectives permits one to conceive of relations with thousands of fields. The suggestive presentation of more precise conceptual categories the displayed instance fits into (e.g."ComputerRelatedProduct" in Fig. 2) coupled with critique commands on these categories premit their successive refinement.t

There is a variety of other pragmatic limitations on the domain of expressible queries that RABBIT offers a partial solution to, though not all of these limitations are tied to relational query languages. Some are limitations of browsing-like interface paradigms like ZOG (Robertson, McCracken & Newell, 1981), others are limitations of strict hierarchical systems like VIEWDATA and PRESTEL. Among these limitations are limits on the number of objects (browsing systems tend to get lost in such conditions), and limits on the relations between object types (restaurant vs business).

USING WHAT THE USER KNOWS

One requirement for anyone proposing a new design paradigm in any field is a boundary condition analysis that can be used to tell the designer when the paradigm will begin to fail. Retrieval by reformulation is no different from any other interface paradigm in that it has conditions where it collapses. Indeed, retrieval by reformulation is interesting in part because it has a very sharp boundary where it begins to fail. In particular, RABBIT makes extensive use of what the user already knows. It presumes the user knows much more about the generic structure of the world than RABBIT does, although RABBIT knows much more about the particulars (instances).

Imagine a database of all the world's wines. Consider a wine novice coming up to that database to select a likely wine for a dinner party. If the user starts RABBIT off by requesting a "wine" the system might return with Chateau Lafite, Burgundy, 1956. It might be described as a leggy wine, with a tannin content of some percentage, sugar balance of something or other, etc. The problem is that if the user does not know anything about the example or anything about the characteristics of wine, then there is nowhere to go. RABBIT does no work for the user in this case.

Of course, this particular problem could be solved if the database designer suspected the user audience would be novices (say, if the database were for a department store shopping system). In such a case the designer would simply tune the conceptual categories and the attributes associated with each to the probable user population. Then again, such tuning is likely to facilitate any database interface. What RABBIT has done is taken away the need for the user to know computer science, or query languages, or the particular vocabulary of a specific database, or have a precise prior conceptualization of the question(s) to be asked.

To get work out of the interface the user must know something about the topic. In some sense RABBIT presumes the user knows far more about the generic structure of the world than RABBIT does. Rather, the information problem facing the user is to determine what subset of his generic world knowledge RABBIT does understand. RABBIT is in effect informing the user of how RABBIT (more properly the database designer) thinks about the world.

This boundary condition of using what the user knows can be seen over and over again in various properties of the RABBIT interaction. For example, the dynamic

t Note that I am not claiming that retrieval by reformulation solves these problems for all time. Rather, I suspect RABBIT has just advanced the frontier a bit. The problems will recur but we will be able to handle more complex objects and more heterogeneous database structures.

350 M . D . W I L L I A M S

perspective mechanism serves, in part, to guide the user to access information along dimensions he is most likely to understand. A personnel worker is unlikely to build a query that accesses details of a person's medical history.

Semantic resolution

One of the principal problems encountered by casual users during the course of creating a query is misinterpretation of vocabulary. PRESTEL and VIEWDATA interactions fail between 13.5 and 28% of the time (McEwen, 1981 ; Whalen & Latremouille, 1981). Young & Hull (1982) report that the major cause of these failures can be attributed to one of several distinct types of misinterpretation of terms used in the database. A RABBIT-like interaction can remove many of these misinterpretations through what we call semantic resolution.

Consider a degenerate version of RABBIT that would present only a template for the items sought [such as is used in a QBE interface (Zloof, 1975)] rather than a fully instantiated example. In our computer example only blank attributes such as manufacturer:, cost:, disk:, would appear. The problem facing a user at this point would be "what values fit into each of these attribute fields?". For manufacturer would a company name (such as Atari, Apple, etc.) be the right thing? Or perhaps a country name (e.g. Japan, U.S.A., France)? Or possibly a company site (Xerox: Rochester plant)? Or maybe even a specific assembly line (Rochester # 4 7 ) ? The meaning of the term manufacturer is ambiguous from the template alone. Much of that ambiguity can be eliminated by the simple expedient of presenting a sample value. We get this for free in RABBIT as a side-effect of presenting a fully instantiated example.

Just as the attribute term manufacturer is ambiguous so is the value "Apple". Do we mean Apple the company, Apple the fruit, Apple the short name for an Apple-II-plus computer? But the term Apple occurring in the context of "manufacture: Apple" has much of its ambiguity removed. The ambiguity of both terms is further reduced because they occur in the context of the entire example instance. The whole does have more meaning than the sum of its parts.

Summary

This article has had three objectives: to describe the current behavior of the RABBIT database interface, to define the essential properties of retrieval by reformulation, and to characterize at least some of the attempts to articulate the novel functionality of the interface.

I have attempted to document some of my movement toward an understanding of the work being accomplished by the RABBIT interface and the essential mechanisms to the accomplishment of that work. This understanding is in a pre-theoretical phase consisting essentially of a series of semi-independent accounts or "stories", each of which captures a different aspect of the overall functionality of the system. Three of the most diverse of the 10 present stories are presented.

The story on the Domain of Queries points to the way a very general underlying query mechanism can be severely restricted by the pragmatics of an interface. Using


What The User Knows demons t ra tes an impor t an t f u n d a m e n t a l cons t ra in t on the pe r fo rmance of RABBIT- l ike interfaces and is indicat ive o f a p h i l o s o p h y o f " seek ing b o u n d a r y c o n d i t i o n s " I th ink essential to good engineer ing pract ice . The story on Semantic Resolution suggests some impor t an t oppor tun i t i e s to begin a quant i ta t ive analysis o f at least some proper t ies o f Retrieval by Reformulation.

In sum, research into interface des igns o f this class has jus t begun. Retrieval by Reformulation is a novel technique for casual user interface to complex da tabases but will require extens ive theoret ica l and p ragma t i c analysis to grow beyond its exper i - mental status to ful ly exploi t the oppor tun i t i e s it presents .

An extraordinary acknowledgement belongs to Austin Henderson for letting me be his apprentice on my first LISP project. He got RABBIT IIl started and provided essential advice and ideas throughout its development. The "path grapher" idea, in particular, is his invention. Path grapher served as the catalyzing force in getting the analysis functions into RABBIT. Special thanks also go to Naomi Miyake who first made me aware of the assumption RABBIT makes about using what the user knows, Tom Malone for discussions on Exemplar Search, and Tom Moran for extensive discussions throughout the development of RABBIT I, lI, and III.

References

BOWER, G. H., BLAtSK, J. B. & TURNER, T. J. (1979). Scripts in text comprehension and memory. Cognitive Psychology, I, 177-220.

KOLODNER, J. L. (1980). Retrieval organization strategies in conceptual memory: a computer model. Research Report r Department of Computer Science, Yale University, New Haven, Connecticut.

MCEWEN, S. A. (1981). An investigation of user search performance on a Telidon information retrieval system. Telidon Behavorial Research 2, Department of Communications, Ottawa (May).

NORMAN, D. A. & BOBROW, D. G. (1979). Descriptions: an intermediate stage in memory retrieval. Cognitive Psychology l l , 107-123.

RISSLAND, E. (1980). Example generation. In Proceedings Third National Conference of the Canadian Society for Computational Studies of Intelligence.

ROBERTSON, G., MCCRACKEN, D. & NEWEl.L, A. (1981). The ZOG approach to man-machine cor~munication. International Journal of Man-Machine Studies, 14, 461-488.

SHANK, R. C. (1980). Failure-driven memory. Cognition and Brain Theory, 1(4), 41-60. SMITH, E. E. & MEDIN, D. L. (1981). Categories and Concepts. Cambridge, Massachusetts:

Harvard University Press. Tou, F. (1982). RABBIT: a novel approach to information retrieval. Unpublished M. S. thesis,

Massachusetts Institute of Technology, Cambridge, Massachusetts. Tou, F., WILLIAMS, M. D., FIKES, R. E., HENDERSON, D. A. & MALONE, T. (1982). RABBIT:

an intelligent database assistant. Proceedings of the AAAI, Pittsburgh, Pennsylvania August. WHALEN, T. & LATRF.MOUILLE, S. (1981). The effectiveness of a tree-structured index when

the existence of information is uncertain. Telidon Behavorial Research 2, Department of Communications, Ottawa, (May).

WILLIAMS, M. O. (1981). Instantiation: a data base interface for the novice user. Xerox Palo Alto Research Center Working Paper.

WILLIAMS, M. D. & HOLLAN, J. D. (1981). The process of retrieval from very long term memory. Cognitive Science, 5 87-119.

WILLIAMS, M. D. & TOU, F. (1982) RABBIT: an interface for data base access. Proceedings of the ACM, Dallas, Texas, October.

WILLIAMS, M. D., Tou , F., FIKES, R. E., HENDERSON, D. A. & MALONE, T. (1982). RABBIT: cognitive science in interface design. Proceedings of the Cognitive Science Society, Ann Arbor, Michigan, August.

352 M.D. WILLIAMS

YOUNG, R. M. & HULL, A. (1982). Cognitive aspects of the selection of viewdata options by casual users. Proceedings of the 6th International Conference on Compi*ter Communication, London, September.

ZLOOF, M. M. (1975). Query by example. In Proceedings of the National Computer Conference pp. 431--437. Arlington, Virginia: AFIPS Press.

Documents

What makes RABBIT run?