Determining the Functionality and Features of an Intelligent Interface to an Information Retrieval System

Determining the Functionality and Features of an Intelligent Interface to an Information Retrieval

system

Nicholas J. Belkin

School of Communication, Information & Library Studies Rutgers University 4 Huntington Street

New Brunswick, NJ 08903 [email protected]

Pier Giorgio Marchetti

European S ace Agent - IRS via Ga P ileo Gali 7 ei

00044 Frascati Italy

[email protected]

ABSTRACT

In this paper, we propose a method for specifying the functionality of an intelligent interface to large-scale information retrieval systems, 0 erational

and for implementing those functions in an

R environment. The method is based on a progressive,

t ree-stage model of intelligent information support; a high- level cognitive task analysis of the information retrieval

7 rob-

lem; a low-level specification of the host system functiona and, derivation of ex lict relations between the s

ity;

and the cognitive tas R s. This method is applied, z stem functions y example, in

the context of the European Space Agency Information Retrieval Service, with some specific suggestions for implementation of a stage one intelligent interface to that system.

L- Introduction

In this paper, we consider the issue of design of intelligent interfaces to existing information retrieval (IR) systems. In particular, we address the following questions:

1. what should the functions of such an interface be; that is, what constitutes intelligence in such an interface;

2. what factors ought to influence the design of such an interface, and how; and,

Permission to copy without fee all part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/ or specific permission.

(Cl 1990 ACM o-89791-408-2 90 0009 151 $1.50'

151

3. how could such an interface be implemented?

We address these questions here within the specific context of the large operational IR system of the European Space Agency (ESA-IRS, using ESA-QUEST), specifying an interface desrgn which takes account of its characteristics and constraints. However, we intend this as an example of the application of our method, which we believe, and attempt below to demonstrate, is general at least to the class of large, operational IR systems. Our context is such that we assume in our discussion that our interface will be tightly coupled to the host system and as much as possible supported by it in its basic functions. Thus we will be rela- tively unconstrained by factors such as telecommunication problems that other interfaces (e. 1986) have had to face, and wi 9 i

Marcus, 1985; Robertson, et al., be able to make use of system

facilities which might otherwise be unavailable. Nevertheless, it seems to us that our general procedures will be applicable to the design of both host and remote interfaces.

2 -- A Desiqn Sequence for an Intelliqent Interface --

Our general approach to the problem of intelligent interface design is to take a sequence of design steps, outlined below, which lead to a design specification, This sequence is:

1.

2,

3.

4.

5.

6.

7,

The statement of a three-stage model of intelligent support for end users of operational IR systems;

Specification of a cognitive task analysis for IR systems in general (a high-level functionality specification); L

Specification of the underlying functionality of the host system (here, ESA-QUEST, the search facility for ESA-IRS);

Establishment of explicit relationships between the cognitive task analysis and system functionality;

Specification of an 'ideal' interface design, given the relationships between desired functions identified in the cognitive task analysis, ble) system functionalities;

and available (or possi-

Identification of environmental constraints and human- computer interaction design principles and specification of their effects on interface implementation;

Design specification for implementation of a stage-one interface.

The body of this paper is the description, and some results, of application of some aspects of this design sequence within our specific context. Here, we present a rationale for this particular approach to the problem.

We begin with a staged model for intelligent interface design on

152

the premises that:

a, the actual accomplishment of such an interface will need to proceed through some sequence of intermediate implementations due to operational constraints, and in order to respond to principles of formative design (Egan, et al, 1989); yet,

b. to maintain continuity of implementation and in the ~es+4,&Process{ we need some overall view of what the

intelligent interface should accomplish.

Thus, the stages, discussed in section 4, serve as a means to integrate a series of interface designs and implementations within an overall goal, sions.

maintaining consistency in design deci-

In contrast to some other approaches taken to intelligent interface (or intermediary) design (see section 3), we attempt to develop an abstract, functional model of what such an interface should accomplish, (which includes

in terms of the problem faced by the IR system the user). This approach is based on the concept

of cognitive task analysis (Roth & Woods, 1988), and requires a specification of the goal of the s stem and the decisions, tasks or functions required to achieve t x The result of such an analysis thus sets the 'ideal'

at goal, requirements for the interface.

We present such an analysis in section 5.

The Possibility for accomplishing our ideal interace is constrained to some extent by the functionality of the underlying host system, in particular its data structures, available knowledge sources and retrieval facilities. In this instance, we acce whit R

t that our interface will be to some existing such system, could conceivably be modified in some ways, but which will

remain basicall static. Our view on how to deal with the constraints that t is f: that which the

imposes is to specify, at a very low level, s

combinations of r stem can actually do, on the assumption that ow-level

functionalities. functions can lead to more complex

In section 6, we indicate the form and content of such a description,

Our next step is to attempt to relate the ideal functionality for the interface to the possibilities afforded by the system. One can view this as trying to move down in level of detail from our cognitive task analysis, and up in level of detail from our system functionality specification, with the intention of meeting somewhere in the middle with a specification of interface func-

.tionality which responds to the Ideal level and is possible to implement. This procedure requires compromises from each end, in terms, for instance, of reduction of aspiration, and of identification of desirable system modifications. One point of this procedure is to identify these compromises in a principled way, so that they can be considered as candidates for incorporation in the interface at its various stages of implementation. We outline such a procedure and and examples of its resulting stage one interface specification, in section 7.

153

Any interface to an IR system will be implemented within certain environmental constraints established, for example, by its users, the host's general policies, telecommunication and hardware issues, and so on. These constraints will have effects on which interface functions will be implemented at any stage of development, and on how they can be implemented!. Furthermore, the actual implementation of any functions mpst take account of what we know of the constraints (e. . ergonom@, computer interaction in genera 9

c . 7

nitive) of human- Both types o constraints are

extremely important, but our view is that they should be considered only insofar as they relate to a: prespecified functionality. In section 8 we discuss some such constraints in our environment, and suggest how they could affect implementation of our interface design.

This sequence of activities leads to identification of sequence of interface functionalities,

a staged and the subsequent direct

specification of a stage one interface implementation. In section 9, we give some examples of an implementation of a stage one intelligent interface to the ESA-IRS system.

2- Interaction and the Function of the IR Interface -- ---

Much previous work in IR has considered the information retrieval process as a sequence of steps leading tp a search statement, which is then put to the system, that statement,

The system then responds to the user evaluates the response, and modification

and iteration takes place as appropriate, In this view, interac- k;;g between the user-and the other components of the system can

searc K lace at the various steps, but its primary locus 1s at the

statement reformulation stage. The prototype model for this view of IR is relevance feedback (e.g. Robertson & Sparck Jones, 1976).

An alternative view is one of progressive development or refine- ment of a search formulation (to include search strategy). This view was exem lified in the THOMAS system (Oddy, 1977), which was designed to a P tion with the

low the user to develop a search through interac- s stem,

cit query 'fl formu ation, without necessarily establishing an expli-

More recently, Belkin 6r Vicker K

(1985), Ingwersen 6 Wormell (1986), and Bates (1986), among ot ers, have suggested that the process of IR is inherently interactive, and that the user's interaction with the data base should be construed as a process during which a search, and query, is gradu- ally constructed, through,formulation and reformulat+on. The $3zerence between this view of IR and the more traditional one

tiona f rlmarlly in the status of the query, which In the tradi-

view is taken as an explicit statement of the user's information requirement, but in the interactive view as a means to an eventual desirable response.

The former view of IR'arises, it seems, from the original context of computerized IR systems, in which elaborate queries and search strategies were constructed through interaction between the user and a human intermediary, highly trained in the characteristics of the system to which the query would be put. After such a search formulation was constructed, the query would be put to the

system, and after a delay of up to two weeks, a response would be sent to the requestor. This procedure effectively removed-any possibility of direct interaction between user or intermed;;;y, and the rest of the system, and reified the evolved que search strategy. A similar procedure and attitude under 71 ies the ty ical Selective Dissemination of Information (SDI) search pro- fl e, P still very much in use.

Although there is substantial interaction in the development and formulation of the search in the traditional view, it has been almost exclusively between the user and the human intermediary, separated from the interaction between the search specification and the rest of the IR system. This separation has tended to be reflected in a general procedure for conducting IR in the current online environment, which sees much of search formulation taking place off-line, in interaction between user and intermediary, with only some modification due to interaction (primarily concerned with methods for reducin in Boolean systems) occuring on ine, i!

or increasing retrieved set size This attitude has been

reinforced by the pricing policy of most data base hosts, which has a large component of connect-time costs, and by the costs of telecommunication. This situation has led, in turn, to the development of front-ends (or, in general, cial IR systems which typically

interfaces) to commer-

tion, but ver support off-line query specifica-

simplifying K little, if any, online interaction, apart from

t e native command language in various ways (see Haw- kins & Levy, 1986, for a survey of such interfaces).

Our view is that for end-user searching in IR systems, one should incorporate all of the necessar intermediary in interaction wit x

functions performed by the human the user-in the interface or in

the system itself, of pro ressive,

taking the view that the whole process is one

4 interactive search formulation. This is now the

genera approach taken by research in Belkin,

'intelligent information retrieval'

1987), (e-g- Brooks 6 Daniels, 1987; Croft & Thomp-

son, in which wholly new IR systems are being designed; it seems to us that this approach applies equally to front ends or interfaces to existing systems. ports this view,

The context of our project sup- since our interface is planned to be embedded as

much as possible in the host, for connect time.

and since ESA-IRS does not charge These two conditions allow for development of

an interface which actively supports user interaction with the system.

There are many front ends or interfaces to IR systems which are in production, in experimental condition, or proposed. Most of them are, from our point of view, concerned with mechanical issues which are not of great relevance to the intelligent interface problem. We mention here four such interfaces which do exhibit features which are of relevance to this program, and attempt to relate them to our work.

One of the first attempts at an intelligent interface'for IR systems was the TIDA project (Meadow, Hewett & Aversa, 1982). This system was implemented on a mainframe connected to a single host search system, and its goals were two-fold; to

K rovide tutorial

assistance to end users in learning to use the ost system; and,

155

to intervene and offer assistance to users during the search process, by identifying patterns of searching behavior and su gest- ing alternative actlons. The first goal was achieved off- s lne in tutorial mode, and also through online help facilities; the second was achieved by monitoring user-system interaction and offering he1

P in cases such as syntactic errors and errors in

search formu ation and strategy.

At about the same time as IIDA, the CONIT interface was develo ed at MIT (Marcus 6 Reintjes, 1981). This interface took a somew at f-i different approach than that of the IIDA project, identifying the problem with which it was concerned as simplified access to a number of hetero eneous systems.

9 The CONIT response to this

issue was two-fo d: to develop a sim lified command language for the end user, which the system trans P ated into the appropriate format for the particular system with which it was connected; and to offer both substantial off-line explanation of searching in general and of its commands, and also extensive online he1 facilities,- based on explanations of its commands. A resu t of P this

R reject was the realization of the significance of sup ort

for t e user in search formulation. This realization was t e R basis of various enhancements to CONIT, reported in Marcus WJW, which were aimed at helpin

% the user to formulate a query

in terms which could be interprete by the host command language, yet did not require explicit Boolean formulation by the user.

A somewhat similar approach has been taken in the OAK project (Meadow, et al., 1989; Borgman, Case & Meadow, 1989), who developed an interface to the RECON retrieval system for the Department of Energy. An interesting aspect of this interface is that it was based on empirical studies of-the proposed users of the system, which was a definite in.novation in the design of front ends. The goal of this interface 'was to help the users to search in the one specific s stem for which the interface was designed. The approach to t 41 is goal was to provide two-sta e support: stage one being a tutorial program in online searc ing K in general and in searching using the specific front end system of OAK; stage two being a search assistance program which aimed to assist in the initial formulation of the search statement, and in search term and output evaluation. The most interesting part of the OAK interface from our point of view here, is the assistance program (OAKASSIST). The approach taken was to develop a new retrieval interface, based on the idea of the IR search as a set of facets, combined by the Boolean 'and', with each facet &3rza; set,of synonymous or related terms combined by the

'or . Since two very difficult aspects of Boolean search langua es are understanding the Booleanoperators and remembering comman 3 s and using them appropriately, this inte;;;Ee attempts to do away with the necessity for these feqtures. the user 1s prompted, through menus, to specify the search topic'as a number of facets, with one window to each.facet. Within each facet, the user is prompted to enumerate related terms, and receives some support through available facilities for browsing in indexes of terms. When the user has specified all of the desired facets and their terms, the front end combines them into the appropriate search formulation, which it sends off to the host system. There are also facilities in OAKASSIST to support the user in

evaluating the retrieved documents.

One of the more recent interfaces to IR systems is called TOMBSEARCHER (Vickery, 1988). This system, based on an earlier expert system for referral called PLEXUS (Vickery, et al., 1988), is explicitly aimed at su port for end users in query construction, in articular in se ecting appropriate terms for searching. P This inte ligent Y intermediary is designed to interface to a variety of data bases; but for each data base it accesses it requires substantial knowledge of various aspects of that data base to be held in its own memory. The reason for this is that the main feature of TOMESEARCHER is its support for off-line search formulation, including he1

R with establishing whether

terms that the user enters are li el to be useful given the

R osting frequencies of terms in the 2 ata base. This front end el

P s the user to formulate a query without forcing an ex

R licit

Boo ean formulation (much in the spirit of OAKASSIST, alt ough without its fairly sophisticated graphics and interaction mech- nanisms), and attempts to take account of various user-defined parameters, such as search output requirements.

These systems, it can be fairly said, represent the state of the art in front ends to operational IR systems. Although they all attempt to support query and search formulation, and they all take some care in assisting interaction between user and front end, none of them actual1 of query and search formu 1

supports the progressive develo ment ation !t

the data base. through interaction direct y with

Indeed, in most of them, cult for the user to take account of

it is extremely diffi- s

1 stem feedback to modify

the query or search strategy. Thus, a 1 of these front ends still subscribe to the two- also all exhibit features t R

art model of IR. Nevertheless, they

any intelligent interface, at we would expect to incorporate In

including the one which we propose here.

These features include the concept of faceted search expression without specification in Boolean terms, substantial help and tutorial facilities, assistance in search term selection based on characteristics of the data base itself, replacement of the native host language with simpler and easier to use alternatives, and responsive and easy to use interface characteristics. In general, all of these front end systems attempt to support the user in search formulation and evaluation of out ut,

Y rather than

to do the search for the user, but all of them a so attempt to take some responsibility for dealing with the intricacies of actually formulating and implementing a search.

What these front ends do not do is to support a truly interactive IR system which includes user, they incorporate and emulate,

intermediary and data base, nor do exce

K t

the knowledgeT;;zea;kivities that in some very default ways,

uman IR systems. articular

intermediaries bring to are the areas which we believe

are required for a rea 1 f; 3[

intelligent interface to IR systems; below, we-present an ana ysis which identifies the functions which such an interface would need to perform, and suggest ways in which these functions could be implemented.

157

4. & Three-Stage Approach to Intelliqent IR Support

The attempt to provide truly intelligent support for end-users of IR systems will naturally require several generations of interface implementation. We say naturally on two grounds. In the first instance, design,

we take the view that the principles of formative as suggested by Egan, et al. (198% t are ap ro riate

K iz in

our situation. That is, we have some ideas about t e ases on which such a system should be built, but many details of the implementation are unclear, as are the effects on performance of specific aspects of our general system design. So we assume that our interface will implementation, 4

o through several iterations of design, eva uation, redesign, and so on.

Our second reason for this statement has tzhzF pith operational realities and implementation strate ies. the requirements on our host system of an idea 7 be substahtial,

intelligent interface will and it is probably unrealistic to ex ect them to

be met immediately. Therefore, our expectation is t R at we will need to begin with a less taxing interface, to test it, and only after having gone through some formative design, to move on, if justified, to a next generation of capabilities.

This strategy for implementation, constraints,

although it responds to system

lems. and also to design principles, also poses some prob-

The most important of these, from our point of view here, is that of maintaining consistency from generation to

9 eneration;

that is, making sure that any one implementation actua ly fits in to our overall 'ideal' interface. To this end, we suggest the following three-stage model toward intelligent information support.

Stage one we characterize as efficient end-user sup E

ort. At this stage, we anticipate only that the interface mask t e underlying

2 uery language, and that the knowledge contained in the interace, that is used to direct the interaction with the user, is

primaril knowledge of the structure of the system. At this stage, t e interface maintains static mopels of, for instance, ?: the user and the system's functions.

The second stage, which we call knowledqe-based end-user support, builds upon the first by incorporating aI m-of what constl- tutes a search strategy, and incorporating knowledge of system functionality within the more flexible search strategy functionality. At this level, more active s is offered to the user in the search formulation process, t:

pport

topic of the search, ased on knowledge of the

as well as of the system functionality. User models will probably be adaptive in some respects.

Tee third-stage, ;;;Eland

intelligent end-user support, will move to,adap- interactive.support of the user at the search-session

. By this we mean that the interface will be able to adapt its functions and operations to what it has learned of the user during the course of the current search session. At this stage, the interface's knowledge will be of system, topic and user, and - it will construct and maintain models of all of these which will change during the course of the session, as required, and which

158

will be used to guide the interaction and to provide support to the user in search formulation and evaluation.

Our point in specif ing these three stages is primarily so that when implementing t K e earlier stages, we construct them SO that they respond to the anticipated requirements of the later stages, insofar as this is possible. Our task then, is to specify just what we consider the third stage of intelligent end-user support to require, in functional terms, and then to identif how the first stage of this ideal functionality can IL

explicitly imple-

mented. In the next section, we propose a cognitive task analysis which leads to such functional specfication.

5 -- & Cognitive Task Analysis for Information Retrieval

2-L. Goals analysis

A cognitive task analysis (CTA) (Roth 6 Woods, 1989) is a specification of the functions and tasks that need to be performed in order to achieve some overall goal, and of the decisions that need to be made in the performance of those tasks. In order to identify the required functionality of our interface, we begin with a CTA for Information retrieval, especially from the points of view of the user and the intermediary (whether human or computer) to the data base.

In order to establish a CTA, it is necessary first to establish an overall goal for the system. In the 1:R system setting, we can do this by defining a hierarchy of goals, and then specifying the level (or levels) of goals which we wish to attain (see Daniels, Brooks & Belkin, 1985). The advantage to .this approach is that it allows one always to view any particular goal within its context, and also to relate specific tasks withrn one

7 oal level to

tasks which might be necessary at another. The goa hierarchy which we propose for IR is specified in figure 1, taken from Daniels, Brooks & Belkin; (1985).

r.JmEll GOAL

1 User leaves system

2 User is satisfied

3 Appropriate response to user

4 System is inappropriate, or Appropriate information

5 Effective search formulation

Figure 1. A goal hierarchy for information retrieval.

159

The overall goal of the IR system (the level one goal) is that the user leave the system. This is becaube the IR system is a support mechanism for users, to which they have recourse only when their own resources are inadequate for responding to some other problem or goal. That is, being in the IR system is not the user's normal or desired situation.

The user may leave for system for a variety of reasons, including frustration, boredom and satisfaction. ye need to know the variety of possible reasons fur leaving the system, in order to design against some of them, and in favor of others. In particular, we wish to consider that the goal at level two is that the user leaves the system because s/he is satisfied.

One way to achieve the level two goal is that the user obtains an appropriate response from the rest of the system. Other ways include the possibility that the user obtains an inappropriate response, which may not be recognized as such. We wish only to consider the 'a propriate response+ branch,

P which becomes our

level three goa for IR. By appropriate, we mean a response which responds to the user's goal and general situation.

One means to achieve the level three goal of an appropriate response is to offer the user information appropriate to the situation. Another, for instance, is to suggest to the user that the particular system in which s/he is interacting is not appropriate to the user's situation. From our point of view, it is at this level that we can begin to consider IR system functional design, system.

and to specify overall goals to be achieved by the In particular,

four goals, those of we wish to consider, of possible level

appropriateness'_ 'appropriate information', and 'system

Thus, we ask, what is required in order to achieve a response of a t E

propriate information? One means, particularly reasonable in e IR system environment with which we are concerned, is by

establishing an effective search formulation. We take this to be a goal specification at level five. In order to achieve our other level four goal (system appropriateness), we need somehow to establish a representation of the user's situation which it is

B ossible to compare to some representation of the s stem's ilities. in principle, this branch of K

capa- Although, t e IR goal

hierarchy is extremely important, we will, for the moment, not consider it, and devote our attention to the goal of establishing an effective search formulation (we return later to the issue of system appropriateness, and will demonstrate that aspects of what one needs to do in order to achieve an e;ffective search forrnula- tion can also be used for the purpose of establishing system appropriateness).

We can define our level five goal of effjective search formulation by specifying the tasks which are required in order to accomplish it. For this, we take the general Distributed Ex ert-Based Information System (DEBIS) model (Belkin, et al, Y 987), es e- cially as pro osed in Belkin,

H See er

Brooks 6 Danie s 9 6 Wersig (1983) and e a- !i

borated in Be kin, (1987). These level five tasks (or goals), outlined in figure 2, we then consider as the

activities in which the user and intermediary collaborate in order to achieve an effective search formulation.

Level 5: Effective Search Formulation

EX

PM: Problem Mode; DM: Dialogue Mode; UM: User Model PD: Problem Description; PS: Problem State; EX: Explanation RG: Response Generator; RS: Retrieval Strategies

Arrows indicate logical (empirically established) and tem- oral relations.

% General sequence of events is from top to

ottom, with iteration and recursion.

Figure 2, A problem structure for information retrieval (after Daniels, Brooks & Belkin, 1985).

In Belkin, Hennings & Seeger (1984), and in Daniels, Brooks and Belkin (1985), a very rough partial order was suggested for the accomplishment of the level five goals. Belkin (1988) also suggested temporal characteristics of clarification (explanation and related activities). Croft and Thompson (1987) specified and implemented an order of accomplishment (or plan) of a similar set of goals for their intelligent intermediary system 13R, which is based on a set of distributed functions similar to our level five goals. This order, ture of figure 2,

or general plan, follows the general struc- going from top to bottom. Thus, these results

' suggest that constructing an effective search formulation proceeds, in general, from the goals of identifying an appropriate level and type of dialogue, a model of the user, a model of the user's state in the roblem-solving process, and a model of the user's problem, to t e goals of establishing a representation E of the usedr's search topic and develo associated search strategy, and final B

ing and implementing an y to the goals of con-

strutting and presenting an appropriate response to the user. Throughout, the goal of explanation is invoked as required.

161

In this goal analysis, we can say that the first oal, of extab- lishing a suitable dialogue mode (DM), is achieve 3 by user and intermediary in mutual agreement, and is of direct use to both. The models of the user (UM) and of the user's state in the problem solving process (PS) are used primar'ily by the intermediary in order to support other activities, appropriateness (PM),

such as deciding on system determining output characteristics (RG),

guiding explanation (EX), and so on. The model of the user's

H roblem and search topic (PD) is used by both parties in formu- ating and developing the search, The representation of search

topic and development of search strategy (RS is a hi hly interactive process among the user, 1 in-termed ary and 3 ata base, used by all. And the presentation of system output and evaluation by the user again involves and is used by both user and intermediary.

1.2. Search formulation support

Brooks (1986) and Daniels (1987) have specified details of Prob- lem Description and Retrieval Strategy (,Brooks) and User Model- l--kg (Daniels) as they appl to an intelligent intermediary sys-

We draw on their resu P ts in order to-specify the characteristics and details of the tasks required to accomplish active support for the user in the search formulation process. The specific tasks associated with search formulation support (and their associated level 5 goals) are:

Choice of interaction mode (DM);

General topic identification (PD);

Database selection (RS);

Specific search topic formulation (PD);

Query specification and representation (RS);

Search strategy formulation (RS);

Output constraints (RG);

Evaluation (All).

The accomplishment of these tasks is viewed as the evolving representation of a search topic and strategy in the context of an online IR system. It begins with the choice of interaction mode (having assumed relevance of system to the user's situation), and then moves to general topic identification, This result is used in order to define a context for specific search topic, query and search strategy formulation, and also as a means for data base selection. Once data bases have been selected, specific search topic formulation begins. This provides a basis for query specification, search strategy formulation and constraint Identification, and is also used, as required, to refor- mulate the general search topic, or to ake new decisions concerning data base selection. The resul of query, strategy and constraint formulation leads to a searc , whose results are used

162

to support reformulation of any of the decisions which have led to the search. The whole process is iterated to completion, as decided by the us er, the reformulation decisions at any point being based on evaluation of system response in terms of user requirements,

Each of these tasks can be viewed as a set of problems which hinder the user in their achievement, with the goal of the intelligent interface being-to &;yzide support which.h+ps the-user to overcome these problems. In effect, speclfles details of the activities the interface needs to perform to in effective search formulation sup art.

R In section 7, we present a struc-

tured description of t ese tasks (excepting interaction mode).

Accomplishment of these tasks will depend upon the functions which we have not discussed in detail here, especially user modelling and ex lanation. of the user mode P for an

Daniels (1987) has specified details intelli ent

system context, Her results, 3 interface in a specific IR

an (19871,

those of Brajnik, Guida & Tasso indicate that an effective user model will re resent

R the

user's goals (at various levels), the user's subject nowledge, the user's previous experience with IR systems, the user's status and the user's general background. combination,

These elements, singly and in are used in accomplishment of all of the search for-

mulation support tasks. For instance, choice of interaction mode will depend upon, among other things, ence with IR systems in general,

the user's previous experi-

lar; or, effective representation and with this system in particu-

of a search topic will depend upon understanding what use the user intends to make of the retrieved information.

Similarly, effective interaction between user, interface and database, as required by the search formulation tasks, depends upon the user's understanding of the other components of the system and of her/his role in the system; explanation, or clarification, is the major means -for ensuring that the user has appropriate understanding (Belkin, 1988). to the user at any point,

Whether explanation is offered and what kind of ex lanation is

offered, can be construed as dependent upon t F-l e interface's model of the user, and of its model of the user's understanding of the system.

Finally, we note that having models of the user's goals, experience, place in the problem solving process, problem and to ic amount to a representation which can be compared to a mode !t of the system's capabilities, in order to decide whether the system is appropriate to the user's situation. For instance, if the interface knows that the user is a junior high school student who

'has a class assiynment to write a four-page essay on DNA, it would be appropriate to inform the user that s/he should not expect to find useful information on ESA-IRS. Depending on what kinds of knowled e the interface has, an appropriate a ernative iI

it might be able to suggest system,

Thus, such as an encyclopaedia.

we see that the Problem Mode function, associated with the level four goal of determining whether the IR system is ate for the user, is capable of being supported by tasks

ap ropri- a Y so

associated with search formulation support.

163

ii- - Host System Functionality

We specify host system functionalit 1

in order to establish the basic operations and resources avai abler to the host, and, even- tually, interface. These define the capabilities of the system, which either the host or the interface mbst manipulate, in order to respond to the desiderata of the CTA. Here, we mention a few such functions within ESA-QUEST, which are germane to our stage one interface specification which res

examples in subsequent sections, or ond directly to the search formulation task description

which fol Ei ows,

ESA-QUEST has the standard host structure of inverted indices to its records, with all that this implies. In particular, this gives substantial statistical charaterization of its databases. Its resources include structured textual descriptions of its resident data bases, and online thesauri for some of them. ESA- QUEST has,a number of pre-defined subjeot clusters of databases, and a facility for user-defined clusters, be searched in cross-file mode.

both types of which be

Its searching facilities include all of the, standard statements of searching for terms and combinations, and so on. It offers in addition a facility for doing quorum searching, which relaxes Boolean constraints and makes ranked output possible. It has a unique facility, called ZOOM, which does statistical characteri- zation of terms associated with specified document sets. This is in general a frequency analysis of terms in various fields of the retrieved documents. ZOOM thus provides data for establishing statistical relations among terms, and between documents and terms. The information derived from ZOOM, in combination with general statistics of the database, can be used for probablisitic relevance feedback (Robertson, et al, 1986), and in other semi- automatic feedback modes (Ingwersen & MeAlpine, 1989).

q Coqnitive Task Analysis to System Functionality

P, t~~:a~YLLon l . we demonstrate, by example, how our methods can lead to specifi&ation of host and interface features for of end-user search formulation.

sup ort We do this by identifying E t e

problems associated with each of the search formulation tasks rdentified in section 5.2, which res

suggesting possible support mechanisms

functiona f; ond to these problems, and drawing upon the host's ities to indicate methods and techniques for accom-

plishing these mechanisms. This exercise also leads to sugges- tion of areas of potential enhancement of host capabilities, by identifying problems for which the system's capabilities offer no potential support mechanisms. Below, we present a structured description of each of the search formulation tasks.

Identification of general search topic

Problems:

Terminology Specification at appropriate level of generality Matching topic specification to system capabilities

support:

Non-restricted input Categorization of system topics User selection of topics Structured topic display System matching of input to topic structure and data base

Methods:

Rich aliasing, through establishment of a 'user thesaurus' (Bates, 1986) Display of hierarchically organized list of Questindex topics, topics

with optional display of databases associated with

Display of frequency characteristics of topic terms in various data bases Display of relative importance of term in database, derived from frequency characteristics of term, size of database, etc. Supporting user search and browsing in full text of ESA- QUEST Directory of Data Bases and Services

Database selection

Problems:

Relationships between databases and topics unknown User unfamiliarit

i5 with database characteristics

Relevance of data ase to topic, goals and desired level of treatment Whether to choose one or several databases.

Support:

Associate structured topic dis Match user topic input to data E

lay with relevant databases ase characteristics

Pre-specification of database to ic User specification of database c !i

groups usters

User interaction with database descriptions and contents.

Display of hierarchically organized list of Questindex topics, Display

with display of databases associated with topics of frequency characteristics of topic terms rn vari-

Methods:

ous data bases Display of relative importance of term in database, derived from frequency characteristics of term, size of database, etc. Supporting user search and browsing in full text of ESA- QUEST Directory of Data Bases and Services Dis wit R

lay of QuestClusters (pre-specified groups of databases) term importance characteristics

Incremental construction of user-selected database cluster for cross-file searching.

165

Initial formulation of search Ltopic

Problems:

Difficulty in statement/specification of requirements Vocabular /terminology mismatch Relations K ips among terms unknown, or mismatch between user relationships and system relationships Knowledge of subject area and literature Relationship of problem to system and database characteristics

Support: /

Ease of input - not constrained .to specification Acce

P t varret

Disp ay of re ated terms and relationships within database f of input

Display of topic/context groupings within database Display of knowledge resources available to interface (from interface and host) Relationship of user input to system descriptions Display of current state of toprc selec%ion

Methods:

Quasi-natural language input with support for concept/facet specification Model document as input, tion;

with document template for descrip- use SuperZoom to display controlled and uncontrolled

terms from retrieved set Direct matching of input to system knowledge resources (thesaurus, controlled terms, uncontrolled terms) with display of conceptual and statistical relationstips within system Drrect manipulation-based browsing in displays of conceptual and statistical relationships Direct retrieval of example documents from selected terms; direct choice of terms from examined documents

Query formulation

Problems:

Terminology Matching of topic description to effective search statement Effects of using specific formulations Effects of changing formulations Identifying appropriate formulations and changes

Support:

Progressive and interactive use of search-topic description for query formulation

' Explicit demonstration of relations between input vocabulary and system vocabulary Display of, and choice from, structured system vocabularies Structured query representation

166 I

Immediate retrieval from query formulations, with explicit demonstration of cause Example successful formulations

Methods:

Xz$ay of search topic description as it is being formu- with access to system terms related to those In topic

description semantically (through thesaurus) or statisti- cally (through Zoom) Rich aliasin Thesaural an 3 other index displays Windows for construction of faceted query formulation Explanation facilities for 'good' query construction Ranking of retrieved documents Direct retrieval of example documents from selected terms; direct choice of terms from examined documents Display of query formulations for topically related queries

Search strateqy formulation

Problems:

Understanding search logic Relating search logic to topic requirements Understanding the results of logical operations Understanding the relationship of lobical operations to desired consequences Identifying appropriate logical statements

Support:

Mask logic from user Redisplay of search formulation for modification Structured representation of query and search Translate displayed topic and query representations into. displayed search structures Relate retrieval results to characteristics of search strategy Provide patterns for search formulation

Methods:

Quasi-natural language input Continuous display of underlying search formulation, with explicit relations to topic and query formulations Visual dis set, step F3

lay of effects of search formulation on retrieved Y step

Search strategy formulation within windowed faceted structure Tern lates

R for search strategy types, to be filled in by user

Ran ed retrieval output m Evaluation and reformulation

Problems:

Understanding system response Finding representative documents for evaluation Relating output to desirable changes Relating output to characteristics of search formulation Incorporating appropriate modifications

Support:

Explanation of factors that led to response (visually and textually) Ranked output of documents Manipulable display of output related to query and search formulation Suggestions for appropriate changes

Methods:

Gra hit display of immediate history of response Exp P anation of immediate history of response Quorum searching Gra hit dis lay of underlying search formulation, with exp P icit re ations to topic and query formulations, and to Y response Direct selection of items from retrieved set to be incorporated into search Facility for indicating degree of relevance of response, for indicatrng characteristics of interest and undesirable characterrstics, and for maintaining record of decisions Semi-automatic relevance feedback based on user evaluations and on Zoom and thesaural structure&, automatically invoked

8 -- Constraints on the Interface --

The potential.users of ESA-IRS are a large, international popula- tion, heterogeneous on many dimensions, backgrounds, working environments,

Ancluding subject matter,

ing experience, and so on. IR system experience, comput-

users will be accessing-the We can suppo$e that most of these s

them as direct end users, K stem in their work roles, many of

,ot ers as P

roxies or intermediaries or representatives of groups. There wil be substantial variety among these users in the extent to which the access ESA-IRS and other IR systems, Their access to ESA-IRS will typically be through micro-corn uters or workstations connected via modems or local area networ R s to telecommunication$ networks on which ESA- XRS is a host.

ESA-IRS will remain a large, primarily centralized facility for access to a wide variety of data bases, but with, of course, a focus on those of relevance to the ESA mission. We assume that it will continue to be innovative in research an-d development policies concerning system enhancements, but also that it will maintain its basic structure and facilities. We also assume that the ESA-IRS policy of promoting user-system interaction (as, for instance, in pricing) will at least be maintained.

All of these factors, combined with general precepts for human- computer interaction in information systems, have a number of

168

implications for ideal interface design and features. Among these, we mention especially:

the need to maintain a variety of access/interaction modes, to respond to the variety of users;

the significance of intelligent explanation facilities, again In response.to user variety;

the ability to maintain long-term models of system users;

general interface flexibility, in terms of tolerance of a variety of data bases and search topics;

support for effective interaction, efficient or quick searching;

rather than merely for

distribution of aspects of the interface between host and local access computers;

portability and generality of direct end-user interface software (for mounting on local machines);

window-based interfaces, modes,

with a variety of interaction

ics, including direct manipulation, menus, forms, graph-

quasi-natural language and command languages.

These implications can be considered as aspects of our method for specification of interface design.

9 -0 Staqe One Interface Implementation.

Our design specification procedure, sections,

as outlined in the previous suggests a general structure for a level one Implemen-

tation of an intelligent-interface to ESA-IRS. Here, we outline the characteristics of such an interface, and then offer an example of what implementation of this interface in support of one kind of activity would look like.

The intelligent interface would be one of three different interfaces available to the system, for choice by user or recommenda- tion by interface (at any point in the interaction). These are: the native command language (ESA-QUEST); a form-filling menu interface based on a standard document template; and interface for search formulation sup ort.

K The first of these is the

current access mode, and t e second is at present an internal prototype. This multiple interface structure responds to the variety of user

Fl references and IR experience, and responds to

some extent to t e Dialogue Mode goal.

The intelligent interface will be window based, with windows corresponding to the tasks to be function. Multiple windows will E

erformed in support.of any one e available at all stages, with

a history'window and a current search formulation window available at all times. and clarity.

This responds to requirements of interaction

169

The basic mode of input will be unconstained natural language, which will be interpreted by the s stem as stemmed key words, with stop words removed. n3: Word co inations will be variously interpreted according to specific function invoked. The other major method of input will be direct selection of displayed items, for instance by pointing and selecting items display, or items in a document dis mined formats, for instance search P

lay. Input into inrgdzErFhic

P -

oglcitemplates wi 1 also be supported. These input modes require keyboard and mouse (or simrlar) interaction. This responds to issues of vocabulary problems, and of interaction support.

The basic mode of display from the system will be graphic, giving a visualization of the structure of the search, a visualization of term relationshi

f: s, a visualization of the history of the ses-

sion, and so on. T ese visualizations are constructed on the basis of initial user input, lation,

and are intended for direcEhgaE;g;- modification and selection by the user. Thus,

can change aspects of the visualizatron, when appropriate, can request rnformation about any as ect of the visualization, can move to consider any aspect of t e visualization, can invoke an R action, and view its result by manipulation in the visualization, and so on. This form of presentation and interaction responds to various issues; one important one is that it provides a natural structure for showing the current state of the search, and the relationship of that to what has come before (and perhaps to what might follow); another is that it provides a powerful framework for browsing, which we consider to be a primary support functi on for search formulation.

The interface will maintain a long-term.user model fofte;;;lsys- tern user, which amounts to a tailored user profile. include data on total uses of the system, databases used, interaction modes used, explanation levels used, and user- specified preferences. The model will be used for initial choice of interaction mode, explanation level, 'level of search formulation support and choice of database selection method (direct or negotiated, for instance).

This suggests the basic environment of the sta e one intelligent interface facility. Our next steps are to imp ement this ? environment in a prototype, to implement the specific functions suggested in the previous section within this environment, and to engage in an experimental formative design exercise with this prototype. This facility responds to our ideal design in several ways r but remains only a stage one implementation especially because it does not incorporate knowledge of the user within the session, and especially because it does not respond actively to the user's situation, tive. Below,

but rather depends highly on user $nitia-

gent we offer ,an example of how this stage one Intelli-

interface could deal with one specafic aspect of search formulation support, even within its limited goals.

The ESA-QUEST retrieval system has already available a tool to be used for manual feedback that can incorporated in the interface for semi-automatic or automatic query reformulation, that is, the ZOOM facility. Possible uses of the actual tool have already

170

been described by Robertson (1986), Ingwersen (1986), McAlpine & Ingwersen (1989).

How to extract from the host IR system all the potentiality already built in is an open questlon. Examining possible areas of improvements it came out that there is an area that could help the user in formulating his query that is not completely exploited; that is, thesaurus browsing. Several on-line bibliographic databases are.enriched by an on-line thesaurus. Thesaurus creation entails a big effort from the file producer but often is difficult for an inexperienced user to benefit from its on-line availability. Again some attempts have been made in order to use the thesaurus for concept specification and automatic query formulation (see e.g. Giger (1988).

One aspect of an intelligent interface is to support terminology choice in problem description. We propose to implement this of support by means of an interactive tool able to browse the

type

thesaurus "before" the search action, retrieved.

when no set has been yet As is well known in existing large information

retrieval systems the thesaurus is available but its accessibil- ity and visibility are rather difficult for inexperienced users. In particular if the concept the user has in mind is expressed by term(s) or phrase(s) that are not thesaurus entries its is very difficult for the user to browse it.

In the basic functional implementation of the interface a key tool will be a browse function with the ability of finding (

f: ointing to) thesaurus entries and from these entries browsing

t e thesaurus hierarchy. in finding in a completely

Of course the major effort has been put automatic and trans

tually related terms that are entries in the t R arent way concep- esaurus In case

that the concept (term or E:

hrase) that the user has in mind is not a thesaurus entry itse f. This would make easy the concept identification stage in the PD phase by simple browsing of the thesaurus. In order to make the query formulation process as much as ossible user eit er E

straightforward the possibility is given to the to simply browse the thesaurus or to search directly

a term in the hierarchy thus retrieving a set.

Thus the core of the basic functional support that is bein? implemented on ESA-QUEST is a browse function capable of linking thesaurus entries to a term or phrase entered by the user before any search process is started, and thereafter in conjunction with the search. It is possible to think to several solutions to the

P roblem, Since our environment (ESA-QUEST) entails mainly use of arge bibliographic databases,

tistical techniques. our approach has been to use sta-

If the term (phrase) the user is entering is not in the thesaurus then a sample of the documents containing that term (

R hrase) is examined. The controlled keywords (control

terms) of t e documents in the sample are ranked according to their frequency in the sample of documents (this uses the ZOOM command). The top five controlled keywords that are also thesaurus entries are shown to the user, As stated above phrases (multiple' terms) are accepted in input, but no real natural language processing is performed on them. This means that no verbs should be used and that (at least at this stage of the

171

project) only a sim le pre-processing is performed. In this pre- processin

3 R terms li e "and" and "or" are treated as logical ANDs

or ORs an the term "in" is transformed in a 1 7

ical "AND". The kind of phrases the user can input are then ana ogous to the one the user can search for via the Common Command Language command "find", In figures 3-8 some examples of a basic interface dialogue using this brovsing function are shown. The results make ;zgf;rence to the file INSPEC as loaded on ESA-QUEST In January

.

Figure 3 shows how the browse-thesarus option is chosen, and the desired term input. Figure 4 shows the conceptually related terms to micro computer, which have been determined by invoking ZOOM on the controlled terms in the documents in which the term micro computer occurs (in any field). Figure 5 indicates how browsing in the thesaurus itself is chosen, ing hierarchy.

and figure 6 shows the result-

ing (where.'ln' Figure 7 is the input of a full phrase for brows- is inter reted by the interface as logical AND),

and figure 8 is the resu t of the frequency analysis of the set Y retrieved by the phrase, interpreted as search.

This sequence.thus demonstrates some characteristics of the in ut and manipulation of the interface, and also, importantly, of t e K combination of various underlying host functions into automatic procedures for responding to user problems.

10. References

BATES, M.J. (1986). model.

Subject access in online catalogs: a design Journal of the American Society for Information Science,

v. 37: 3b7 3/6 -- - *

BELKIN, N-J. (1988). On the nature and function of explanation in intelligent information retrieval. Ip: Proceedings of the 11th ACM SIGIR International Conference on Research and Develop- ment in Information Retrieval, Grenoble, 1988. Grenoble, Presses Universitaires de Grenoble: 135-145,

BELKIN, N.J., BROOKS, H.M. & DANIELS, P.J. (1987). Knowled e elicitation using discourse analysis. International Journa Y of Man-Machine Studies, v. 27: 127-144.

BELKIN, N.J., HENNINGS, R.-D. & SEEGER, T. (1984). Simulation of a distributed expert-based information provision mechanism. Information Technology: Research, DeVelQpment, Applications, v.

: 22-141.

BELKIN, N.J., SEEGER, T, & WERSIG, G. (1983). Distributed expert problem treatment as a model for information system analysis and design. Journal of Information Science, v. 5: -153-167,

BELKIN, N.J. 6 VICKERY, A. (1985). Interaction in information

-*Library. (Library & Information Research ReportT5). London, The

BELKIN, N.J. et al. (1987). Distributed expert-based information systems: An interdisciplinary approach. Information Processinq

172

and Manaqement, v. 23: 395-409.

BORGMAN, C-L., CASE, D-0. & MEADOW, C.T. (1989). The design and evaluation of a pront-end interface for energy researchers. Journal of the American Society for Information Science, v. 40: 99-109. - -

BROOKS, H.M. (1986). retrieval retrreval Information

CROFT, W.B. & THOMPSON, R. H. (1987). I3R: A new approach to the design of document retrieval systems. Journal of the American Society for Information Science, v.38; 389-404. -

DANIELS, P.J. (1987). De;&ozAE&iE; ~~~~i~~~~l~i~~e~~nct~ an intelliqent interface Y FFiesis, London.

Department of Information Science, The City Univers

DANIELS, P.J., BROOKS, H.M. & BELKIN, N,J. -(1985). Using problem structures for driving human-computer dialogues. In: RIAO '85. Actes of the Conference: Recherche d'Informations Assistee paF Ordinateur, Grenoble, March 1985. Grenoble, I.M.A.G.: 131-149.

EGAN, D.E., et al. (1989). Book.

Formative design-evaluation of Super- ACM Transactions on Information Systems, v. 7: 30-57.

GIGER, H.P. (1988). Concept Based Retrieval in Classical IR Sys- tems. In: Proceedinqs of the 11th ACM SIGIR International Conference on Research and Developemtn in Information Retrieval, Grenoble, 1988. Grenoble, Presses Universitaires de Grenoble: 275-290 *

INGWERSEN, P. & WORMELL; I, (1986). Improved Subject Access, Browsing and Scanning Mechanisms in Modern Online IR. In: Proceedinqs of the 9th ACM SIGIR International Conference on Research and Development in Information Retrieval, Pisa, 1986. New York, ACM: 68-76.

M&&PINE, G. 6 INGWERSEN, P. (1989). Integrated information retrieval in a knowledge worker support system. In: Proceedinqs of the 12th ACM SIGIR International Conference on Research and Development in Information Retrieval, Cambridge, MA, 1989. New York, ACM: 48-57.

MARCUS, R.S. (1985). Development and testing of expert systems for retrieval assistance, In: Proceedinqs of the 48th Annual Meeting of the ASIS, vol. 22. try Publications: 289-292.

White Plains, NY, Knowledge Xndus-

MARCUS, R.S. 6 REINTHES, J.F. (1981). A translating computer interface for end-user operation of heterogeneous retrieval systems. I. Design & II. Evaluations. FIJournal of the American Society for Information Science, v. 32: 287-303; 304-317.

173

MEADOW, C.T., HEWETT, T.T. & AVERSA, E-S, A computer intermediary for interactive database searching. I. Design & II. Evaluation. FIJournal of the American Society for Information Science, v. 33: 325-332; 357-364.

MEADOW, C-T,, et al. (1989). design.

Online Access to Knowledge: System FIJournal of the American Society for Informatlon Sci-

ence, v. 40: 86-98.

ODDY, R.N. (1977). Information retrieval through man-machine dialogue. Journal of Documentation, v, 33: l-14.

ROBERTSON, S.E. & SPARCK JONES, K. (1976). Relevance weighting of search terms, Journal of the American Society for Information Science, v. 27: 129-146. -----

ROBERTSON, S.E., THOMPSON, C.L., MACASKILL, M-J. & BOVEY, J,D.(1986). Weighting, front-end 'system.

ranking and relevance feedback in a FIJournal of Information Science,, v. 12: 71-

75.

ROTH, E.M. & WOODS, D.D. (1989). Cognitive task analysis: An approach to knowledge acquisition for intelligent system design.

Topics in ex ert system desi n: Methodologies and tools G, (%da & C . %&s&%&. Amstd North'Holland: m-c'

VICKERY, A. (1988). tems. In:

The experience of building expert search sys- Online Information 88. Proceedings of the 12th Inter-

national Online Information Meeting, London, December 1988. Oxford, Learned Information: 301-313.

VICKERY, A., BROOKS, H-M., ROBINSON, B.A. & STEPHENS, J. (1988). An ex ert system for referral (Library and Information Research zp&). LondK The British Library.

174

Figure 3- Choosing to browse the thesaurus, and specifying a starting term.

Figure 4. Display of related terms from thesaurus, derived from frequency analysis in retrieved document set.

Figure 5. Choosing a term to browse from in the thesaurus,

Figure 6, Thesaurus hierarchy display.

176

Figure 7, Natural language phrase input for thesaurus browsing,

Figure 8, Thesaurus terms related to input phrase,

Documents

Determining the Functionality and Features of an Intelligent Interface to an Information Retrieval System