KM Technology

8/14/2019 KM Technology

1/17

Knowledgemanagement

technology

by A. D. Marwick

Selected technologies that contribute toknowledge management solutions are reviewedusing Nonakas model of organizational

knowledge creation as a framework. The extentto which knowledge transformation within and

between tacit and explicit forms can be supported by the technologies is discussed, andsome likely future trends are identified. It isfound that the strongest contribution to current

solutions is made by technologies that deal largely with explicit knowledge, such as searchand classification. Contributions to the formationand communication of tacit knowledge, and support for making it explicit, are currently

weaker, although some encouragingdevelopments are highlighted, such as the useof text-based chat, expertise location, andunrestricted bulletin boards. Through surveying

some of the technologies used for knowledgemanagement, this paper serves as an introduction to the subject for those papers inthis issue that discuss technology.

The goal of this paper is to provide an overviewof technologies that can be applied to knowl-edge management and to assess their actual or po-tential contribution to the basic processes of knowl-edge creation and sharing within organizations. Theaim is to identify trends and new developments thatseem to be significant and to relate them to tech-nology research in the field, rather than to providea comprehensive review of available products.

Knowledge management (see, for example, Daven-port and Prusak1) is the namegiven to the set of sys-tematic and disciplined actions that an organizationcantake to obtainthe greatest value from the knowl-edge available to it. Knowledge in this context in-

cludes both the experience and understanding of thepeople in the organization and the information ar-tifacts, such as documents and reports, availablewithin the organization and in the world outside.Effective knowledge management typically requiresan appropriate combination of organizational, so-cial, and managerial initiatives along with, in manycases, deployment of appropriate technology. It isthe technology and its applicability that is the focusof this paper.

To structure the discussion of technologies, it is help-

ful to classify the technologies by reference to thenotions of tacit and explicit knowledge introducedby Polanyi in the 1950s 2,3 and used by Nonaka 4,5 toformulate a theory of organizational learning thatfocuses on the conversion of knowledgebetween tacitand explicit forms. Tacit knowledge is what theknower knows, which is derived from experience andembodies beliefs and values. Tacit knowledge is ac-tionable knowledge, and therefore the most valuable.Furthermore, tacit knowledge is the most importantbasis for the generation of new knowledge, that is,according to Nonaka: thekey to knowledge creationlies in the mobilization and conversion of tacit knowl-

edge.5

Explicit knowledge is represented by someartifact, such as a document or a video, which hastypically been created with the goal of communicat-ing with another person. Both forms of knowledgeare important for organizational effectiveness. 6

Copyright 2001 by International Business Machines Corpora-tion. Copying in printed form for private use is permitted with-outpayment of royalty provided that (1)each reproduction is donewithout alteration and (2) the Journal reference and IBM copy-right notice are included on the first page. The title and abstract,but no other portions, of this paper may be copied or distributedroyalty free without further permission by computer-based andother information-service systems. Permission to republish anyother portion of this paper must be obtained from the Editor.

MARWICK 0018-8670/01/$5.00 2001 IBM IBM SYSTEMS JOURNAL, VOL 40, NO 4, 2001814


2/17

These ideas lead us to focus on the processes by

which knowledge is transformed betweenits tacit andexplicit forms, as shown in Figure 1.5 Organizationallearning takes place as individuals participate in theseprocesses, since by doing so their knowledge isshared, articulated, and made available to others.Creation of new knowledge takes place through theprocesses of combination and internalization. Asshown in Figure 1, the processes by which knowl-edge is transformed within and between forms us-able by people are

Socialization (tacit to tacit): Socialization includesthe shared formation and communication of tacit

knowledge between people, e.g., in meetings.Knowledge sharing is often done without ever pro-ducing explicit knowledge and, to be most effec-tive, should take place between people who havea common culture and can work together effec-tively (see Davenport and Prusak, 1 p. 96). Thustacit knowledge sharing is connected to ideas ofcommunities and collaboration. A typical activityin which tacit knowledge sharing can take place isa team meeting during which experiences are de-scribed and discussed.

Externalization (tacit to explicit): By its nature, tacitknowledge is difficult to convert intoexplicit knowl-edge. Through conceptualization, elicitation, andultimately articulation, typically in collaborationwith others, some proportion of a persons tacitknowledge may be captured in explicit form. Typ-ical activities in which the conversion takes placeare in dialog among team members, in respond-ing to questions, or through the elicitation of sto-ries.

Combination: (explicit to explicit): Explicit knowl-edge can be shared in meetings, via documents,e-mails, etc., or through education and training.The use of technology to manage and search col-lections of explicit knowledge is well established.However, there is a further opportunity to foster

knowledge creation, namely to enrich the collectedinformation in some way, such as by reconfiguringit, so that it is more usable. An example is to usetext classification to assign documents automati-cally to a subject schema. A typical activity heremight be to put a document into a shared data-base.

Internalization (explicit to tacit): In order to act oninformation, individuals have to understand andinternalize it, which involves creating their owntacit knowledge. By reading documents, they canto some extent re-experience what others previ-ously learned. By reading documents from many

sources, they have the opportunity to create newknowledge by combining theirexisting tacit knowl-edge with the knowledge of others. However, thisprocess is becoming more challenging because in-dividuals have to deal with ever-larger amounts ofinformation. A typical activity would be to readand study documents from a number of differentdatabases.

These processes do not occur in isolation, but worktogether in different combinationsin typical businesssituations. For example, knowledge creation resultsfrom interaction of persons and tacit and explicitknowledge. Through interaction with others, tacitknowledge is externalized and shared.7 Although in-dividuals, such as employees, for example, experienceeach of these processes from a knowledge manage-ment and therefore an organizational perspective,the greatest value occurs from their combinationsince, as already noted, new knowledge is therebycreated, disseminated, and internalized by other em-ployees who can therefore act on it and thus formnewexperiences andtacit knowledgethat can in turnbe shared with others and so on. 7 Since all the pro-

cesses of Figure 1 are important, it seems likely thatknowledge management solutions should support allof them, although we must recognize that the bal-ance between them in a particular organization willdepend on the knowledge management strategyused. 8

Table 1 shows some examples of technologies thatmay be applied to facilitate the knowledge conver-sion processes of Figure 1. These technologies andothers are discussed in this paper. The individualtechnologies are not in themselves knowledge man-agement solutions. Instead, when brought to mar-

Figure 1 Conversion of knowledge between tacit and

explicit forms (after Nonaka )

IBM SYSTEMS JOURNAL, VOL 40, NO 4, 2001 MARWICK 815


3/17

ket they are typically embedded in a smaller num-ber of solutions packages, each of which is designedto be adaptable to solve a range of business prob-lems. Examples are portals, collaboration software,and distance learning software. Each of these canand does include several different technologies.

The approach to the technology of knowledge man-agement in this paper emphasizes human knowledge.Sometimes in computer science knowledge man-agement is interpreted to mean the acquisition anduse of knowledge by computers, but that is not themeaning used here. In any case, automatic extrac-tion of deep knowledge (i.e., in a form that capturesthe majority of the meaning) from documents is anelusive goal. Today the level of automatic extractionis deemed to be rather shallow because only a sub-

set of the meaning, sometimes a very limited one,

can be captured, ranging from recognition of enti-ties such as proper names or noun phrases to au-tomatic extraction of ontological relations of vari-ous kinds (e.g., References 9 and 10), and there isno system that can reason (in the sense of deducingsomething new from what it already knows) over theextracted knowledge in a way that even approachesthe capabilities of a human. As an example of thecurrent state of the art in applications for extractingknowledge automatically, Figure 2 shows a system11

for analyzing reports of appellate court decisions tofind the precedents they may affect. Court opinionsare analyzed to find language that refers to other

cases that the opinion may modify or invalidate. Thecandidate cases are retrieved from a database of lawreports andare presented to an analyst forfinal judg-ment. The results are used to enrich the database with appropriate cross-references. Here the ap-proach is that a template defines the fragment ofknowledge to be sought, and the system tries to fillit by extracting information from the text. However,the candidate pieces of extracted knowledge muststill be presented to a human for review and finaldecision, so that the value of the system is in increas-ing the productivity of the human analysts. For theforeseeable future, knowledge management in bus-iness will be about human knowledge in its variousforms.

The use of technology in knowledge management isnot new, and considerable experience has been builtup by the early pioneers. Even before the availabil-ity of solutions such as Lotus Notes** 12 on whichmany contemporary knowledge management solu-tions are based, companies were deploying intranets,such as EPRINET, 13 basedon early generations of net-working and computer technology that improved ac-cess to knowledge on line. Collaboration andknowledge sharing solutions also arose from the de-velopment of on-line conferencing and forums14 us-

ing mainframe computer technology. Today, ofcourse, intranets andthe Internet are ubiquitous, andwe are rapidly approaching the situation where allthe written information needed by a person to dohis or her job is available on line. However, that isnot to say that it can be used effectively with the toolscurrently available.

It is important to note that knowledge managementproblems can typically not be solved by the deploy-ment of a technology solution alone. The greatestdifficulty in knowledge management identifiedbythe

Figure 2 Information extraction using template filling11

HUMAN REVIEW

LIST OFCANDIDATECASES

STRUCTUREDQUERY LANGUAGE

FILLED CITATIONTEMPLATES

LAW REPORTSDATABASE

COURT

OPINIONFINDCANDIDATELANGUAGE

Table 1 Examples of technologies that can support orenhance the transformation of knowledge

Tacit to Tacit Tacit to ExplicitE-meetings Answering questionsSynchronous collaboration

(chat)Annotation

Explicit to Tacit Explicit to Explicit

Visualization Text searchBrowsable video/audio of

presentationsDocument categorization

MARWICK IBM SYSTEMS JOURNAL, VOL 40, NO 4, 2001816


4/17

respondents in a survey15was changing peoplesbe-

havior, and the current biggest impediment toknowledge transfer was culture. Overcoming tech-nological limitations was much less important. Therole of technology is often to overcome barriers oftime or space that otherwise would be the limitingfactors. Forexample,a research organization dividedamong several laboratories in different countriesneeds a system that scientists with common inter-ests canuse to exchange information with each otherwithout traveling, whereas a document managementsystem can ensure that valuable explicit knowledgeis preserved so that it can be consulted in the future.Two caveats must be stated at this point. First is the

point made by Ackerman

16

that in many respects thestate of the art is such that many of the social as-pects of work important in knowledge managementcannot currently be addressed by technology. Ack-erman refers to this situation as a social technicalgap. Second, the coupling between behavior andtechnology is two-way: the introduction of technol-ogy may influence the way individuals work. Peoplecan and do adapt their way of working to take ad-vantage of new tools as they become available, andthis adaptation can produce new and more effectivecommunication within teams (e.g., the effect of in-troducing solutions based on Lotus Notes on pro-cess teams in a paper mill described by Robinson etal.17 or theadaptations made by people in a customersupport organization studied by Orlikowski 18 afterNotes was introduced).

Other surveys of technology for knowledge manage-ment can be found in the book, Working Knowledgeby Davenport and Prusak, 1 and in a paper by Jack-son.19 Prospects for using artificial intelligence (AI)techniquesin knowledge management have been dis-cussed recently by Smith and Farquhar. 20

In the following sections of this paper the technol-ogies that support the processes of Figure 1 are de-scribed in more detail and illustrated with examplesdrawn largely from current research projects.

Tacit to tacit

The most typical wayin which tacit knowledge is builtand shared is in face-to-face meetings and sharedexperiences, often informal, in which informationtechnology (IT) plays a minimal role. However, anincreasing proportion of meetings and other inter-personal interactions use on-line tools known asgroupware. These tools are used either to supple-

ment conventional meetings, or in some cases to re-

place them. To what extent can these tools facilitateformulation and transfer of tacit knowledge?

Groupware. Groupware is a fairly broad category ofapplication software that helps individuals to worktogether in groups or teams. Groupware canto some

extent support all four of the facets of knowledgetransformation. To examine the role of groupwarein socialization we focus on two important aspects:shared experiences and trust.

Shared experiences are an important basis for theformation and sharing of tacit knowledge. Group-ware provides a synthetic environment, often calleda virtual space, within which participants can sharecertain kinds of experience; for example, they canconduct meetings, listen to presentations, have dis-cussions, and share documents relevant to some task.Indeed, if a geographically dispersed team nevermeets face to face, the importance of shared expe-riences in virtual spaces is proportionally enhanced.An example of current groupware is Lotus Notes, 12

which facilitates the sharing of documents and dis-cussions and allows various applications for sharinginformation and conducting asynchronous discus-sions to be built. Groupware might be thought tomainly facilitate the combination process, i.e., shar-ing of explicit knowledge. However, the selection anddiscussion of the explicit knowledge to some degree

constitutes a shared experience.

A richer kind of shared experience can be providedby applications that support real-time on-line meet-ingsa more recent category of groupware. On-linemeetings can include video and text-based confer-encing, as well as synchronous communication andchat. Text-based chat is believed to be capable ofsupporting a group of people in knowledge sharingin a conversational mode. 21 Commercial productsof thistype include Lotus Sametime** and MicrosoftNetMeeting**. Theseproducts integrate both instantmessaging and on-line meeting capabilities. Instant

Shared experiences are

an important basis for the

formation and sharing of

tacit knowledge.



5/17

messaging is found to have properties between thoseof the personal meeting and the telephone: it is lessintrusive than interrupting a person with a questionbut more effective than the telephone in broadcast-

ing a query to a group and leaving it to be answeredlater.

In work on the Babble system, 22 chat was evaluatedby at least some users as being . . . much more likeconversation,which is promising for the kind of di-alog in which tacit knowledge might be formed andmade explicit. However, not all on-line meeting sys-tems have the properties of face-to-face meetings.For example, the videoconferencing system studiedby Fish et al. 23 was judged by its users to be morelike a video telephone than like a face-to-face meet-ing. Currently, rather than replacing face-to-facemeetings, many on-line meetings are found to com-plement existing collaboration systems and the well-established phone conference and are thereforeprobably more suited to the exchange of explicitrather than tacit knowledge. On-line meetings ex-tend phone conferences by allowing applicationscreens to be viewed by the participants or by pro-viding a shared whiteboard. An extension is for partof the meeting to take place in virtual reality withthe participants represented by avatars.24 One re-search directionis to integrate on-line meetings withclassic groupware-like applications that support doc-ument sharing and asynchronous discussion. An ex-ample is theIBM-BoeingTeamSpace project,25which

helps to manage both the artifacts of a project andthe processes followed by the team. On-line meet-ings are recorded as artifacts and can be replayedwithin TeamSpace, thus allowing even individualswho were notpresent in the original meetingto sharesome aspects of the experience.

Some of thelimitations of groupware fortacit knowl-edge formation and sharing have been highlightedby recent work on the closely related issue of the de-gree of trust established among the participants.26

It was found that videoconferencing (at high reso-lutionnot Internet video) was almost as good as

face-to-face meetings, whereas audio conferencing

was less effective and text chat least so. These re-sults suggest that a new generation of videoconfer-encing might be helpful in the socialization process,at least in so far as it facilitates the building of trust.But even current groupware products have featuresthat are found to be helpful in this regard. In par-ticular, accesscontrol, which is a feature of most com-mercial products, enables access to the discussionsto be restricted to the team members if appropriate,which has been shown22 to encourage frankness andbuild trust.

Another approach to tacit knowledge sharing is for

a system to find persons with common interests, whoare candidatesto join a community. In Foners YentaSystem,27 the similarity of the documents used bypeople allowed the system to infer that their inter-ests were similar. Location of other people with sim-ilar interests is a function that can be added to per-sonalization systems, the goal of which is to routeincoming information to individuals interested in it.There are obvious privacy problems to overcome.

Expertise location. Suppose ones goal is not to findsomeone with common interests but to get advicefrom an expert who is willing to share his or herknowledge. Expertise location systems have the goalof suggesting the names of persons who have knowl-edge in a particular area. In their simplest form, suchsystems are search engines for individuals, but theyare only as good as the evidence that they use to in-fer expertise. Some possible sourcesof suchevidenceare shown in Table 2.

Theproblemwith using an explicit profile isthatper-sonsmay not bemotivated to keep it up to date, sinceto them it is just another form to fill in. Thus it ispreferable to gather information automatically, ifpossible, from existing sources. For example, a per-sons resume or a list of the project teams that he

or she has worked on may exist in a company da-tabase. Another automatic approach is to infer ex-pertise from the contents of documents with whicha persons name is associated. For example, author-ship (creation or editing) of a document presumablyindicates some familiarity with the subjects it dis-cusses, whereas activities such as reading indicatesome interest in the subject matter. Two approachesto using document evidence for expertise locationsuggest themselves: either the documents canbe clas-sified according to some schema, thus classifying theirauthors; or when a user submits a query to the ex-pertise location system, it searches the documents,

Table 2 Sources of evidence for an expertise location

system

A profile or form filled in by a userAn existing company database, for example one held by

the Human Resources departmentName-document associationsQuestions answered



6/17

transforms the query to a list of authors (suitably

weighted), and returns the list as the result of theexpertise search.

The current state of the art is to use the first threesources of evidence listed in Table 2: explicit profiles,evidencemined fromexistingdatabases, and evidenceinferred from association of persons and documents.For example, the Lotus Discovery Server** productcontains a facility whereby an individuals expertiseis determined using these techniques, 28 while it andthe Tacit Knowledge Systems KnowledgeMail**product29 analyze the e-mail a person writes to forma profile of his or her expertise. Given the proper-

ties of on-line discussions, discussed below, it is rea-sonable to suppose that a fourth source of evidencecould be the content of the questions answered bya person in such a system, with the added advantagethat such a person is already willing to be helpful.This example is a simple case of the social interac-tion dimension in expertise location which, as foundin empirical studies (e.g., Reference 30), is an im-portant factor but is not yet reflected in available ap-plications, perhaps because of the difficulty of cap-turing aspects such as the experts communicationskills, in order to rate how useful he or she is likelyto be.

Tacit to explicit

According to Nonaka, the conversion of tacit to ex-plicit knowledge (externalization) involves forminga sharedmental model, then articulating through di-alog. Collaboration systems and other groupware(for example, specialized brainstorming applica-tions 31) can support this kind of interaction to someextent.

On-line discussion databases are another potentialtool to capture tacit knowledge and to apply it toimmediate problems. We have already noted that

team members may share knowledge in groupwareapplications. To be most effective for externalization,the discussion should be such as to allow the formu-lation and sharing of metaphors andanalogies, whichprobably requires a fairly informal and even free-wheeling style. This style is more likely to be foundin chat and other real-time interactions within teams.

Newsgroups and similar forums are open to all, un-like typical team discussions, and share some of thesame characteristics in that questions can be posedand answered, but differ in that the participants aretypically strangers. Nevertheless, it is found that

many people who participate in newsgroups are will-ing to offer advice and assistance, presumably drivenby a mixtureof motivations including altruism, a wishto be seen as an expert, and the thanks and positivefeedback contributed by the people they have helped.

Within organizations, few of the problems experi-enced on Internet newsgroups are found, such asflaming, personal abuse, and irrelevant postings.IBMs experience in this regard is described byFoulger.14 Figure 3 shows a typical exchange in aninternal company forum, renderedhere using a stan-dard newsgroup browsing application. It illustrateshow open discussion groups are used to contributeknowledge in response to a request for help. Note

both the speed of response and the fact that the an-swerer has made other contributionspreviously. Thearchive of the forum becomes a repository of usefulknowledge. Clearly the question answerer in thiscasehas made a number of contributions and could beconsidered to be an expert. Although the exchangeis superficially one of purely explicit knowledge, theexpert must first make a judgment as to the natureof the problem and then as to the most likely solu-tion, both of which bring his or her tacit knowledgeinto play. Once the knowledge is made explicit, per-sons with similar problems can find the solution byconsulting the archive. A quantitative study32 of this

Figure 3 An example of an exchange in an internal

company forum



7/17

phenomenonin the IBM system showedthatthe great

majority of interchanges were of this question-and-answer pattern, and that even though a large frac-tion of questions were answered by just a few per-sons, an equal proportion were answered by personswho only answered one or two questions. Thus theconferencing facility enabled knowledge to be elic-ited from the broad community as well as from a fewexperts.

Explicit to explicit

There can be little doubt that the phase of knowl-edge transformation best supported by IT is combi-

nation, because it deals with explicit knowledge. Wecan distinguish the challenges of knowledge man-agement from those of information management bybearing in mind that in knowledge management theconversion of explicit knowledge from and to tacitknowledge is always involved. This leads us to em-phasize new factors as challenges that technologymay be able to address.

Capturing knowledge. Once tacit knowledge hasbeen conceptualized and articulated, thus convert-ing it to explicit knowledge, capturing it in a persis-tent form as a report, an e-mail, a presentation, ora Web page makes it available to the rest of the or-ganization. Technology already contributes to knowl-edge capture through the ubiquitoususe of word pro-cessing, which generates electronic documents thatare easy to share via the Web, e-mail, or a documentmanagement system. Capturing explicit knowledgein this way makes it available to a wider audience,and improving knowledge capture isagoalofmanyknowledge management projects. One issue in im-proving knowledge capture is that individuals maynot be motivated to use the available tools to cap-ture their knowledge. Technology may help by im-proving their motivation or by reducing the barriersto generating shareable electronic documents.

One way to motivate people to capture knowledgeis to reward them for doing so. If rewards are to belinked to quality rather than quantity, some way tomeasure the quality of the output is needed. Qualityin the abstract is extremely difficult to assess, sinceit depends on the potential use to which the docu-ment is to be put. For example, a document that ex-plains basic concepts clearly would be useful for anovice but useless to someone who is already an ex-pert. If we focus on usefulness as a measure of qual-ity, and if we substitute use for usefulness, thenwe have something that IT systems can measure. In

fact, portal infrastructures that mediate access to

documents can easily accumulate metrics of docu-ment use, and hence can estimate usefulness andquality. The next generation of products will includesuch features. 28

Another measure of quality is the number of timesa document has been cited, as in the scholarly lit-erature, or the number of times it has been hyper-linked to, as on the Internet. A citation or hyperlinkis evidence that the author of the citing or linkingdocument thought that the target document is valu-able. The most valuable or authoritative documentscan be detected in Internet applications by analyz-

ingthe links between Web pages, thus measuringthecumulative effects of numerous value judgments(e.g., see References 33 and 34). The numeric qual-ity estimate that can be derived is useful in infor-mation retrieval, where it can be used to boost theposition of high-quality documents in the search re-sults list. This method has been applied to citationanalysis in scientific papers by the ResearchIndexsearch engine 35,36 and to Web search by the Googlesearch engine. 37

Citation analysis of this kind detects quality assess-ments made in the course of authoring documents.Qualityjudgmentsby experts are another way to cap-ture their knowledge. There are, of course, many de-ployed solutions in which documents undergo a qual-ity review through a refereeing process, oftenfacilitated by a workflow application. In this case,the quality judgment acts as a gate, and documentsjudged to be of low quality are not distributed. How-ever, technology also makes it feasible to recordjudg-ments as annotations of existing documents.38 Here,the association of an annotation with a document isrecorded in some infrastructure, such as a specialannotation server that the users browser accessesto find annotations of the Web page being viewed.Numeric data stored in databases can also be anno-

tated39 to record various interpretations, judgments,or cautions. Annotations may also support collab-oration around documents, 40 although, as in otherapplications where the underlying documents maybe altered, the annotation system needs to be robustin the face of changes.

Although the most common way to capture knowl-edge by far is to write a document, technology hasmade the use of other forms of media feasible. Dig-ital audio and video recordings are now easily made,and an expert may find that speaking to a cameraor microphone is easier or more convenient than



8/17

writing, particularly if the video is of a presentationthat has to be made in the ordinary course of bus-iness, or if the audio recording can be made in anotherwise unproductive free moment. It is also nowrelatively easy to distribute audio andvideo over net-works. However, nontext digital media have the dis-advantage of being more difficult to search and tobrowse than text documents and, hence, are lessusable as materials in a repository of knowledge.Browsing of video has been improved by summari-zation techniques that automatically produce a gal-lery of extracted still images, each of which repre-sents a significant passage in the video.41 If the videois of someone giving a presentation, images of thespeaker alonewill not convey as much as a summarythat includes images of any visual aids, such as slides

or charts, that accompany the narrative. Several sys-tems that key a recording of a presentation to theslides have been described. 42 44

Although video searching systems have been builtthat use image searching 45 of extracted frames, 46,47

they are hampered by the difficulty of composing asemantically meaningful image query. A more fruit-ful approach to searching is to extract text from themultimedia object, if possible. Although in somecases the video may contain text (on images of textslides), in most cases the challenge is to convertspeech to text.

Speech recognition. Improvements in the accuracyof automatic speech recognition (ASR) hold out thepromise of usable speaker-independent recognitionwithunconstrained vocabulary in the foreseeable fu-ture. Figure 4 shows progress with time in a numberof standardized speech recognition tasks. Word er-ror rates were reported in the Speech RecognitionWorkshop conferences of the National Institute ofStandards and Technology. The accuracy varies withthe difficulty of the task. The resource managementtask involves reading speech with a 1000-word vo-cabulary. Broadcast newsusesrecordings withan ap-proximately 20K word vocabulary, whereas the Call-Home and switchboard are telephone (lower speechquality) recognition tasks with unconstrained vocab-ulary. In all cases the accuracy shows steady improve-

ment with time.

Accuracy for speech recorded under controlled con-ditions is already acceptable, but the error rate forpoor quality recordings (for example, from the tele-phone) is still high enough to cause problems for ap-plications unless the vocabulary is constrained. How-ever, the trends depicted in Figure 4 show that futureimprovements can reasonably be expected and willlead to new ways to capture knowledge.

Although perfect or near-perfect transcription pro-duces a text transcript that can be browsed like any

Figure 4 Improvement in various automatic speech transcription tasks over time

1985

1

10

100

W

ORDERRORRATE(%)

1990

YEAR

1995 2000

CALLHOME

SWITCHBOARD

RESOURCE MANAGEMENT

BROADCAST NEWS



9/17

other piece of text, ways to make an imperfect tran-

script usable as a browsing aid are being investigat-ed.48,49 In this work even an imperfect transcript sup-ports browsing because certain words and phrases,which are judged to be significant and for which theestimated accuracy ofASR is high, are highlighted.

Such techniques can be used to make the replay ofaudio more usable even where the transcript as awhole is unreadable because of the density of errors.The highlights can be used to find the passage of in-terest.

Search. The most important technology for the ma-nipulation of explicit knowledge helps people withthe most basic task of all: finding it. Since the trendin most organizations is for essentiallyall documentsto become available in electronic form on line, thechallenge of on-line accesshas been transformed intothe challenge of finding the materials relevant forsome task. Furthermore, the total amount of poten-tially relevant information, including what is on theInternet and company intranets and what is avail-able from commercial on-line publishers, continuesto grow rapidly. Thus text search, which only 10 yearsago was a tool primarily used by librarians to searchbibliographic databases, has become an everydayap-plication used by almost everyone. Not surprisingly,the new uses of text search have motivated new workon the technology.

Another driving factor in the use of on-line explicitknowledge is the diversity of sources from which itis available. It is not uncommon for users to have tolook in several databasesor Web sites for potentiallyrelevant information. Since there is little standard-ization, users have to cope with different user inter-faces, different search languageconventions, and dif-ferent result list presentations. Portals describedin another paper in this issue 50are a popular ap-proach to reducing the complexity of the users task.The key aspect that allows a portal to do this is thatit maintains its own meta-data about the informa-tion to which it gives access. In the current state of

the art, the meta-data may be quite simple, consist-

ing of a list of sources and a search index formedfrom the content of the sources. Even this simplefunction provides great value because it relieves theuser of the need to visit all the sources to find outwhether they contain relevant information. The useris therefore made more productive, and the qualityof his or her work is improved. Most portal systemsuse a single search index, whichrequires that the doc-uments in the domainof interest have to be retrievedby spidering or crawling at indexing time. Thealternative, using distributed search as in, for exam-ple, the Harvest project,51 has not proved to be pop-ular for knowledge management applications, per-

haps because advances in hardware have made itcheaper to build a central index. Recent develop-ments in peer-to-peer applications, such as Gnu-tella 52 and the collaboration application Groove, 53

have promoted a new interest in distributed search,which may lead to new advances.

The index that is built by a text search engine con-sists of a list of the words that occur in the indexeddocuments, along with a data structure (the invertedfile) that allows the documents in which the wordsoccurred to be determined efficiently at searchtime.54 Users cantherefore use query words that they

expect to occur in the documents. The problem isthat not all the documents will use the same wordsto refer to the same concept and, therefore, not allthe documents that discuss the concept will be re-trieved. In a world of information overload this sit-uation is not usually a problem, but for applicationswhere it is important to have high recall, an alter-native approach canbe used in which documents areassigned meta-data that describe the concepts theydiscuss in a controlled vocabulary. This is a classicalapproach used in bibliographic databases. However, where searches are being done by untrained endusers rather than librarians, the evidence is that

searching with natural language gives better resultsthan does searching with a controlled vocabulary. 55

The most common problemin a search is that a queryretrieves many documents that are irrelevant to theusers needs, known as the problem of search preci-sion (a measure of accuracy). Precision is ofparamountimportance in a world ofinfo-glut. However, resultsfrom TREC (Text REtrievalConference)56 indicatethatthe accuracy of natural language search engine tech-nology has reached a plateau in recent years. Whatare the prospects of improvements to the searchfunction that will benefit knowledge management

The most common problem

in a search is that a query

retrieves many documents

irrelevant to the users

needs.



10/17

systems? Two areas of potential improvement can

be identified: increased knowledge of the user andof the context of his or her information need, andimproved knowledge of the domain being searched.

The notion that increased knowledge of the user canbe beneficial comes from the realization that in al-most all search systems today the only informationabout the users information need that is availableto the system is the query. The most common querysubmitted to Web-based searchservices is two words,and the average query length is only about 2.3words.57 Obviously, this amount is not much infor-mation. A challenging research area is to gather bet-

ter information about the context of a search and tobuild search engines that can use this informationto good advantage.

The goal of gathering and using more informationabout the domain being searched is one that is well-established, but progress so far has been limited. Itis common to use a thesaurusa kind of simple do-main modelas an adjunct to a search, although thisis more common in systems designed for specialists.Expansion of a query with synonyms is known to im-prove the recallin a text search, but expansionis onlyeffective in well-defined domains where the ambi-guity of words, and the validity of term relationships,is not an issue. To improve precision in broad-do-main searching by reducing the ambiguity of ordi-nary words using thesauri or other structures suchas ontologies has been a goal of much research, withmany negative results (e.g., Reference 58). Recently,however, some encouraging findings have been ob-tained. 54 Using WordNet 59 (a large manually builtthesaurus that is widely available), combined withautomatically built data structures encoding co-oc-currence and head-modifier relations, Mandala etal.60 showedsignificant improvements in average pre-cision, a measure of accuracy, as shown in Figure 5.The results were obtained using TREC data, from

queries derived from the search topics using the ti-tle field, the title and description fields, or all thefields in the topic. Woods et al. 61 also reported im-provements by using a different approach to encod-ing knowledge of the domain, in this case a seman-tic network that integrated syntactic, semantic, andmorphological relationships.

Taxonomies and document classification. Knowl-edge of a domain can also be encoded as a knowl-edge map, or taxonomy, i.e., a hierarchically or-ganized set of categories. The relationships withinthe hierarchy can be of different kinds, depending

on the application, and a typical taxonomy includesseveral different kinds of relations. The value of ataxonomy is twofold. First, it allows a user to nav-igate to documents of interest without doing a search(in practice, a combination of the two strategies isoften used if it is available). Second, a knowledgemap allows documents to be put in a context, whichhelps users to assess their applicability to the taskin hand. The most familiar example of a taxonomyis Yahoo!, 62 but there are many examples of special-ized taxonomies used at other sites and in companyintranet applications.

Manually assigning documents to the categories ina taxonomy requires significant effort and cost, butin recent years automatic document classification hasadvanced to thepointwhere theaccuracyof thebest-performing algorithms exceeds 85 percent (F1 mea-sure) on good quality data. 63 This degree of accu-racy is adequate for many applications and is in factcomparable to what can be achieved by manual clas-sifiers in a well-organized operation, 64 although theaccuracy of automatic classification over differenttypes of data varies quite widely. 65An attractive fea-ture of the current generation of automatic classi-fiersis their inclusion of machine-learning algorithms

Figure 5 Improved average precision in text search using

combined thesauri for query expansion60

2.5 14.3 57.6

A

VERAGEPRECISION(%)

30

25

20

15

10

5

0

AVERAGE QUERY LENGTH (WORDS)(TITLE, DESCRIPTION, OR ALL)

NO EXPANSION WITH COMBINEDTHESAURI



11/17

that train themselves from example data, whereasthe previous generation required construction of acomplex description of the category in the form, forexample, of an elaborate query. Selecting documentsas training examples is a simpler task.

Automatic classification, althoughsimple in concept,is capable of surprisingly refined distinctions, givenenough training data. For example, it has beenknown for some time (see the brief review in Ku-kich66) that automatic essay marking systems can as-sign grades to student essays with an accuracy andconsistency only slightly worse than human graders,and recently it has been shown that a document clas-sifier can perform well in this application. 67 Table3 shows the results of comparing two human grad-ers and an automatic classifier. The automatic clas-sifier performed very nearly as well as the humangraders, both in accuracy and consistency, eventhough the test essays were on unconstrained sub-jects.

Despite the power of automatic classification, thereare many challenges in implementing solutions us-ing taxonomies. The first challenge is the design ofthe taxonomy, which has to be comprehensible tousers (so that they can use it for navigation with no

or minimal training) and has to cover the domainof interest in enough detail to be useful. There area number of strategies for building a taxonomy, 68 in-cluding the use of document clustering to proposecandidate subcategories. However, human input isprobably required to ensure that the taxonomy re-flects business needs (e.g., it emphasizes some as-pect that may be significant but is not a strong themein the documents). Thus, clustering can be seen asan adjunct to human effort. One usability challengeis to ensure that the user of a taxonomy editor canunderstand the clusters that are proposed, using au-tomatically generatedlabels. The labelstypicallycon-

tain words or phrases that are chosen to represent

the documents in the cluster; recently a techniquefor using extracted sentenceshas been proposed. 69,70

Taxonomies have proved to be a popular way inwhich to build a domain model to help users to searchand navigate, so much so that the trend seems to befor each group of users of any size to have their owntaxonomy. This popularity is understandable becauseas on-line tools become central to individualswork,they naturally want to see the information displayedwithin a schema that reflects their own priorities andworldview, and that uses the terminology that theyuse. This trend is likely to lead to a proliferation of

taxonomies in knowledge management applications.It follows that there will be an increasing focus onthe need to map from one taxonomy to another soas to bridge between the schemas used by differentgroups within an organization.

Portals and meta-data. As already mentioned, por-tals provide a convenient location for the storage ofmeta-data about documents in their domain, andtwoexamples of such meta-data, search indexes and aknowledge map or taxonomy, have been discussed.In the future, increasing use of natural language pro-cessing (NLP) in portals is likely to generate new kindsof meta-data. The general trend is for more struc-tured informationmeta-datato be automaticallygenerated as part of the indexing service of the por-tal. It is efficient to generate these meta-data whenthe document has been retrieved for text indexing.The value of the meta-data is in encapsulating in-formation about the document that can be used tobuild selected views of the information space, suchas a list of the documents in a given subject cate-gory, or mentioning a geographic location, througha database lookup in response to a user click. Thismakes exploration of the information easier andmore rewarding, in effect providing the user with anew experience based on the exploration on which

new tacit knowledge can be built as part of the in-ternalization process to be discussed later.

Summarization. Document summaries are examplesof meta-data of this kind. The value of a summaryis that it allows users to avoid reading a documentif it is not relevant to their current tasks. Figure 6shows results from Tombros and Sanderson 71 whoshowed that users performing a simple information-seeking task had to read many fewer full documentswhen they used a system that provided summariesthan when the system provided document titlesalone. Automatic generation of summaries is an ac-

Table 3 Essay grading with an automatic text classifier66

Exact Grade(%)

Adjacent Grade(%)

G1: auto vs manual* 55 97G1: manual A vs B 56 95G2: auto vs manual* 52 96G2: manual A vs B 56 95

*The performance of the classifier is compared with two human markers, A and B,and it performs almost as well. In each comparison, the proportion of test essayswhere the same or an adjacent grade was assigned is given. Here manual refersto the average of the two human graders, whereas G1 and G2 are two open-domain essay-writing tasks.



12/17

tive area of research. Commercially available sum-

marizers use the sentence-selection method, origi-nated by Luhn in 1958, 72 in which an indicativesummary is constructed from what are judged to bethe most salient sentences in a document. However,the summary may be incoherent, e.g., if the selectedsentences contain anaphors. Construction of morecoherent summaries, implying the use of natural lan-guage generation, currently requires that the sub-ject domain of the documents be severely restricted,as for example, to basketball games. 73 Summariza-tion of long documents containing several topics isimproved by topic segmentation 74 and can be fur-ther condensed for presentation on handheld de-

vices,

75

whereas summarization of multiple docu-ments, either about the same event76 or in anunconstrainedset of domains,70 is another challengebeing addressed by current research. For other re-cent work see References 77 through 79.

Explicit to tacit

Technology to help users form new tacit knowledge,for example, by better appreciating and understand-ing explicit knowledge, is a challenge of particularimportance in knowledge management, since acqui-sition of tacit knowledge is a necessary precursor totaking constructive action. A knowledge manage-ment system should, in addition to information re-trieval, facilitate the understanding and use of in-formation. For example, the system might, throughdocument analysisand classification, generate meta-data to support rapid browsing and exploration ofthe available information. It seems likely that the fu-ture trend will be for information infrastructures toperform more of this kind of processing in order tofacilitate different modes of use of information (e.g.,search, exploration, finding associations) and thusto make the information more valuable by makingit easier to form new tacit knowledge from it. Otherprocessing of explicit knowledge, already described,

can support understanding. For example, putting adocument in the context of a subject category or ofa step in a business process, by using document cat-egorization, can help a user to understand the ap-plicability or potential value of its information.Discovery of relationships between and amongdocuments and concepts helps users to learn by ex-ploring an information space.

A quite different set of technologies applies to theformation of tacit knowledge through learning,especially in the domain of on-line education or dis-tance learning. Within organizations, on-line learning

has the advantage of being able to be accomplishedwithout travel and at times that are compatible withother work. A wide variety of tools and applicationssupport distance learning. 80 The needs of the cor-porate training market, emphasizing self-directedlearning rather thaninstructor-led learning, have ledto a focus on interactive courseware based on theWeb or on downloaded applications. In the future,modulesof self-directedtraining will be found in por-tals, along with other materials.

Information overload is a trend that motivates theadoption of new technology to assist in the compre-hension of explicit knowledge. The large amountsof (often redundant) information available in mod-

ern organizations, and the need to integrate infor-mation from many sources in order to make betterdecisions, cause difficulties for knowledge workersand others.81 Both of these trendsresult directly fromthe large amounts of on-line information availableto knowledge workers in modern organizations. In-formation overload occurs when the quality of deci-sions is reduced because the decision maker spendstime reviewing more information than is needed, in-stead of reflecting and making the decision. Vari-ous approaches to mitigating information overloadare feasible. The redundancy and repetition in theinformation can be reduced by eliminating duplicate

Figure 6 The proportion of documents read by subjects

using information retrieval systems to performa task71

WITH WITHOUT

DOCUMENTSREVIEWED

30

25

20

15

10

5

0

USE OF SUMMARY

PERQUERY(%)



13/17

or overlapping messages (related to the Topic De-

tection and Tracking track at TREC82

). An agent canfilter or prioritize the messages, or compound viewscan make it easier to review the incoming informa-tion. Finally, visualization techniquescan be appliedin an attempt to help the user understand the avail-able information more easily.

Different visualizations of a large collection of doc-uments have been used with the goal of making subject-based browsing and navigation easier. Thesemethods include text-based category trees, exempli-fied by the current Yahoo! user interface. Severalgraphical visualizations have also been described.

Themescape83

uses (among other things) a shadedtopographic map as a metaphor to represent the dif-ferent subject themes (by location), their relatedness(by distance), and the proportional representationof the theme in the collection (by height), whereasVisualNet 84 uses a different map metaphor forshow-ing subject categories. Another approach is repre-sented by the Cat-a-Cone system85 that allows vi-sualization of documents in a large taxonomy orontology. In this system the model is three-dimen-sional and is rendered using forced perspective.Search is used to select a subset of the available doc-uments for visualization.

Other visualization experiments have attempted toprovide a user with some insight into which queryterms occur in the documents in a results list, as wasdone in Hearsts TileBars 86 and the application de-scribed by Veerasamy and Belkin. 87 However, theevaluation described in the latter paper showed thatthe advantage of the visualization in the test task wassmall at best. A later study, 88 which compared text,two-dimensional, and pseudo three-dimensional in-terfaces for information retrieval, found that thericher interfaces provided no advantage in the searchtasks that were studied. This result may explain why

graphical visualization has not been widely adoptedin search applications, whereas text-based interfacesare ubiquitous.

Perhaps a more promising application of visualiza-tion is to help a user grasp relationships, such as thosebetween conceptsin a set of documents as in the Lex-ical Navigation system described by Cooper andByrd 89 or the relationships expressed as hyperlinksbetween documents. 90 This use is more promisingbecause of the difficulty of rendering relationshipstextually. Furthermore, figuring out the relationshipswithin a set of documents is a task that requires a

lot of processing, and computer assistance is of great

value.

Conclusion

This paper has surveyed a number of technologiesthat canbe applied to build knowledge managementsolutions and has attempted to assess their actual orpotential contributions to the processes underlyingorganizational knowledge creation using the Nonakamodel. The essence of this model is to divide theknowledge creation processes into four categories:socialization (tacit knowledge formation and com-munication), externalization (formation of explicit

knowledge from tacit knowledge), combination (useof explicit knowledge), and internalization (forma-tion of newtacit knowledgefrom explicit knowledge).The value of this model in thepresent context is thatit focuses attention on tacit knowledge (which is fea-tured in three of the four processes) andthus on peo-ple and their use of technology.

Because all four of the processes in the Nonakamodel are important in knowledge management,which aims to foster organizational knowledge cre-ation, we might seek to support all of themwithtech-nology. Although early generations of knowledgemanagement solutions (solutions typically integrateseveral technologies) focused on explicit knowledgein the form of documents and databases, there is atrend to expand the scope of the solutions somewhatto integrate technologies that can, to some extent,foster the use of tacit knowledge. Among these tech-nologies nowbeing applied in some knowledge man-agement solutions are those for electronic meetings,for text-based chat, for collaboration (both synchro-nous and asynchronous), for amassing judgmentsabout quality, and for so-called expertise location.These technologies are in addition to those for han-dling documents, such as search and classification,which are already well-established yet are still de-

veloping.

Despite these trends, there are still significant short-falls in the ability of technology to support the useof tacit knowledgefor whichface-to-face meetingsare still the touchstone of effectiveness. As Acker-man has pointed out, this lack of ability is not justbecause the designers of the applications do not ap-preciate how important the human dimension is (al-though that is true in some cases). We simply do notunderstand well enough how to accommodate thisdimension in computer-supported cooperative work.Many of the factors that mediate effective face-to-



14/17

face human-human interactions are not well under-

stood, nor do we have good models for how theymight be substituted for or synthesized in human-computer interactions. We can expect gradual pro-gress in this direction, perhaps aided by improve-ments in the general fidelity with which peoplesfaces, expressions, and gestures are rendered in (forexample) high-bandwidth videoconferencing, butthere can be no assurance of an immediate break-through because of the complexity of the problemand the current shortfall in the basic understandingof its elements.

However, the survey in this paper has highlighted

many factors that provide grounds forsome optimismwhen we considerhow technology can help in knowl-edge management. Technologycan assist teams, whoin todays world may meet only occasionally or evennever, to share experiences on line in order to beable to build and share tacit knowledge, and moregenerally to work effectively together, even if the ef-ficiency is less than in face-to-face meetings. Fromthe perspective of tacit knowledge formation andsharing, the relative informality of text-based chatis probably superior to more structured discussions,which may, however, be effective for sharing explicitknowledge. The importance of limiting access toteam members has been highlighted by recent work.The chat archive, and other recordings of on-linemeetings, have the added advantage of being ableto help in the socialization of people who miss partsof the original interaction. It is also encouraging thatrecent work by Olson and Olson and their collab-orators has shown that studio-quality video is help-ful in some tasks related to knowledge management,such as collaboration (in some cases) andtrust build-ing. 26

Another encouraging use of technology is to helppersons who need to share knowledge to find eachother. Expertise location systems are in their infancy

in industrial practice but hold out the promise of be-ing able to identify individuals with the right knowl-edge. Even without actually identifying a person, un-restricted forums and bulletin boards have beenshownto be effective in eliciting assistance both fromexperts and from the broader community. It seemslikely that appropriate integration of this approachwith chat on the one hand and expertise location onthe other will result in more effective access to andcommunication of the knowledge in an organization.

Another way to tap the knowledge of experts isthrough capturing their judgments, expressed as an-

notation, hyperlinks, citations, and other interactions

with documents. Portal infrastructures, which me-diate and can collect metrics on the interaction ofpeople and documents, are ideal for amassing thiskind of information. Currently, portal products arejust becoming capable of accumulating meta-data ofthis kind. Another trend is for their meta-data to be-come richer and to support a broader range of tasks.In particular, the meta-data can support the forma-tion of new tacit knowledge from the explicit knowl-edge indexed by the portal, for example, by situat-ing documents within a new conceptual frameworkrepresented by a knowledge map. It is becomingcheaper to use several different frameworks for this

purpose, and thus to match them better to the needsof different groups of users, because the accuracy ofautomatic text classification is improving and, forsome classes of content such as news stories, is al-ready as good as the accuracy of human indexers.

Technology willclearly become more helpful in deal-ing with information overload. Techniques such assummarization can reduce the load of persons at-tempting to find the right documents to use in sometask. There is some promise, as yet unfulfilled, thatintelligent agents may in the future help persons toprioritize the messages they receive. And the meta-data stored by portals can be used to draw visual-izations of large amounts of information, although,contrary to intuition, graphical visualizations seemnot to be better than their text-based equivalents, atleast for information retrieval tasks.

Finally, it should be emphasized again that this pa-per has dealt with human knowledge, not with theformation or use of expert systems or similar knowl-edge-based systems that aim to replace human rea-soning with machine intelligence. The current capa-bility of machine intelligence is such that, for thegreat majority of business applications, humanknowledge will continue to be a valuable resourcefor the foreseeable future, and technology to helpto leverage it will be increasingly valuable and ca-pable.

**Trademark or registeredtrademarkof LotusDevelopment Cor-poration, Microsoft Corporation, or Tacit Knowledge Systems.

Cited references

1. T.H. Davenportand L. Prusak, Working Knowledge:How Or- ganizations Manage What They Know, Harvard BusinessSchool Press, Boston, MA (1998).

2. M. Polanyi, The Tacit Dimension, Routledge & Kegan Paul,London (1996).



15/17


16/17

IEEE International Conference on Acoustics, Speech, and Sig-

nal Processing (1999).44. S.Srinivasan, D. Ponceleon,A. Amir, andD. Petkovic, WhatIs in That Video Anyway?: In Search of Better Browsing,

IEEEInternational Conference on Multimedia Computing andSystems, Florence, Italy (1999).

45. W. Niblack, R. Barber, W. Equitz, M. Flickner, E. H. Glas-man, D. Petkovic, P. Yanker, C. Faloutsos, and G. Taubin,The QIBC Project: Querying Images by Content, UsingColor, Texture, and Shape, Storage and Retrieval for Image

and Video Databases, SPIE Proceedings Series, Vol. 1908, SanJose, CA (February 1993), pp. 173187.

46. M. Flickner, H. S. Sawhney, J. Ashley, Q. Huang, B. Dom,M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Steele, andP. Yanker, Query by Image andVideoContent, Computer28, No. 9, 2332 (1995).

47. M. G. Christel, T. Kanade, M. Mauldin, R. Reddy, M. Sirbu,

S. M.Stevens,and H. D. Wactlar, Informedia Digital VideoLibrary, Communications of theACM38,No.4,5758 (1995).48. J. W. Cooper, M. Viswanathan, and Z. Kazi, Samsa: A

Speech Analysis,Mining and Summary Application for Out-bound Telephone Calls, Proceedings of the 34th Annual Ha-

waii International Conference on System Sciences HICSS-34(2001).

49. E. W. Brown, S. Srinivasan, A. Coden, D. Ponceleon,J. W. Cooper, and A. Amir, Toward Speech as a Knowl-edge Resource, IBM Systems Journal 40, No. 4, 9851001(2001, this issue).

50. R. Mack, Y. Ravin, and R. J. Byrd, Knowledge Portals andthe Emerging Digital Knowledge Workplace, IBM Systems

Journal 40, No. 4, 925955 (2001, this issue).51. C. M. Bowman, P. B. Danzig, D. R. Hardy, U. Manber, and

M. F. Schwartz, TheHarvest InformationDiscovery andAc-

cessSystem, Computer Networks and ISDN Systems28

,119 125 (1995).52. Gnutella is a protocol for information-sharing technology;

hub at http://gnutella.wego.com.53. Groove Networks, http://www.groove.net/.54. R. Baeza-Yates, B. Ribeiro-Neto,and R. Baeza-Yates,Mod-

ern Information Retrieval, Addison-Wesley Publishing Co.,Reading, MA (1999).

55. D. D. Lewis and K. Sparck Jones, Natural Language Pro-cessing for Information Retrieval, Communications of the

ACM 39, No. 1, 92 (1996).56. Proceedings of the Eighth Text Retrieval Conference (TREC-

8), E. M. Voorhees and D. K. Harman, Editors, at http://trec.nist.gov/pubs.html/ (2000).

57. A. Spink, D. Wolfram, B. J. Jansen, andT. Saracevic,Search-ing the Web: The Public and Their Queries, Journal of the

American Society of Information Science 53, No. 2, 226 234(2001).

58. E. M. Voorhees, On Expanding Query Vectors with Lex-ically Related Words, Second Text Retrieval Conference(TREC-2) (1993).

59. WordNet: An Electronic Lexical Database ( Language, Speechand Communication), C.Fellbaum, Editor, MITPress, Cam-bridge, MA (1998).

60. R.Mandala,T. Tokunaga,and H. Tanaka, CombiningMul-tiple Evidence from Different Types of Thesaurus for QueryExpansion, Proceedings of SIGIR 99 (1999).

61. W.A. Woods,L. A.Bookman, A.Houston,R. J.Kuhns,andP. Martin, Linguistic Knowledge Can Improve InformationRetrieval,Language TechnologyJoint Conference, ANLP Ses-sions, Seattle, WA (April 29 May 4, 2000).

62. Yahoo! is at http://www.yahoo.com.

63. Y. Yang and X. Liu, A Re-Examination of Text Categori-

zation Methods, Proceedings of SIGIR 99 (1999).64. A. Joscelyne, Automatic Information Refining: A ReutersSuccess Story, Language Industry Monitor, http://www.lim.nl/monitor/reuters.html (May/June 1991).

65. T.ZhangandF. J.Oles, TextCategorization Based on Reg-ularized LinearClassificationMethods,InformationRetrieval4, No. 1, 531 (2001).

66. K. Kukich, Beyond Automated Essay Scoring, IEEE In-telligent Systems 15,No.5,2227 (SeptemberOctober 2000).

67. L. S. Larkey, Automatic Essay Grading Using Text Cate-gorization Techniques, SIGIR 98: Proceedings of the 21st

Annual InternationalACM SIGIR Conference on ResearchandDevelopment in Information Retrieval, Melbourne, Australia(1998).

68. W. Pohs, G. Pinder, C. Dougherty, and M. White, The Lo-tus Knowledge Discovery System: Tools and Experiences,

IBM Systems Journal 40, No. 4, 956 966 (2001, this issue).69. R. Kubota Ando, Latent Semantic-Space: Iterative ScalingImproves Precision of Inter-Document Similarity Measure-ment, Proceedings of SIGIR 00 (2000).

70. R.K. Ando, B.K. Boguraev,R. J.Byrd,andM. S.Neff, Multi-Document Summarization by Visualizing Topical Content,

Proceedings of ANLP/NAACL 2000 Workshop on AutomaticSummarization (2000).

71. A. Tombros and M. Sanderson, Advantages of Query Bi-ased Summaries in Information Retrieval, Proceedings ofSIGIR 98 (1998).

72. H. P. Luhn, The Automatic Creation of Literature Ab-stracts, IBM Journal of Research and Development 2, No. 2,159165 (1958).

73. J.Robin and K.R. McKeown,CorpusAnalysis for Revision-Based Generation of Complex Sentences,Proceedings of the

NationalConference on Artificial Intelligence, Washington,DC(1993).74. B. K. Boguraev and M. S. Neff, Discourse Segmentation in

Aid of Document Summarization, Proceedings of the 33rdHawaii International Conference on System Sciences, Maui,HI(2000).

75. B.Boguraev, R. Bellamy, andC. Swart, SummarizationMin-iaturization: Delivery of News to Hand-Helds, Proceedings

of Workshop on Automatic Summarization, Annual Meetingof the North American Chapter of the Association for Com- putational Linguistics, Pittsburgh, PA (2001).

76. D. R. Radev andK. R. McKeown, Generating Natural Lan-guageSummaries from Multiple On-Line Sources, Compu-tational Linguistics 24, 469 500 (1998).

77. I. Mani, D. House,G. Klein,L. Hirshman,L. Obrst,T. Firmin,M. Chrzanowski, and B. Sundheim, The TIPSTER SummacText SummarizationEvaluation

, TechnicalReport, Mitre Cor-poration, McLean, VA (1998).78. Advances in Automatic Text Summarization, I. Mani and

M. Maybury, Editors, MIT Press, Cambridge, MA (1999).79. Proceedings of ANLP/NAACL 2000 and 2001 Workshops on

Automatic Summarization.80. Advanced Learning Technology: Design and Development Is-

sues, J. C. Kinshuk and T. Okamoto, Editors, IEEE Com-puter Society, Los Alamitos, CA (2000).

81. D. Shenk,Data Smog:Survivingthe InformationGlut, Harper,San Francisco (1998).

82. TREC Topic Detection and Tracking, see http://www.nist.gov/speech/tests/tdt/index.htm.

83. J. A. Wise, J. J. Thomas, K. Pennock, D. Lantrip, M. Pottier,A. Schur, and V. Crow, Visualizing the Non-Visual: SpatialAnalysis and Interaction with Information from Text Doc-



17/17

uments, Proceedings of IEEE Information Visualization 95,

Atlanta, GA (1995).84. VisualNet is described at http://www.map.net.85. M. Hearst and C. Karadi, Cat-a-Cone: An Interactive In-

terface for Specifying Searches and Viewing Retrieval Re-sults Using a Large Category Hierarchy, Proceedings ofSIGIR 97, Philadelphia, PA (1997).

86. M. A. Hearst, TileBars: Visualization of Term DistributionInformation in Full Text Information Access, Proceedings

of ACM SIGCHI Conference on Human Factors in Comput-ing Systems, Denver, CO (May 1995), pp. 59 66.

87. A. Veerasamy and N. J. Belkin, Evaluation of a Tool forVisualization of Information Retrieval Results,ACMSIGIRConference on Research and Development in Information Re-trieval, Zurich (1996).

88. M. M. Sebrechts, J. Cugini, S. J. Laskowski, J. Vasilakis, andM. S. Miller, Visualization of Search Results: A Compar-ative Evaluation of Text, 2D, and 3D Interfaces, SIGIR 99

22nd International Conference on Research and Developmentin Information Retrieval, Berkeley, CA (1999).

89. J. W. Cooper and R. J. Byrd, Lexical Navigation: VisuallyPrompted Query Expansion and Refinement, Proceedings

of Digital Libraries 97, Philadelphia, PA (1997).90. I. Ben-Shaul, M. Herscovici, M.Jacovi, Y.S. Maarek, D. Pel-

leg, M. Shtalhaim, V. Soroka, and S. Ur, Adding Supportfor Dynamic and Focused Search with Fetuccino, Proceed-ings of WWW8, Toronto (1999).

Accepted for publication June 15, 2001.

Alan D. Marwick IBM Research Division, Thomas J. Watson Re-search Center, P.O. Box 704, Yorktown Heights, New York 10598(electronic mail: [email protected]). Dr. Marwick received

B.Sc. and D.Phil. degrees in physics from the University of Sus-sex in Britain, then worked on the application of nuclear meth-ods of analysis to research problems in materials science and theeffect of radiation on solids, firstat theAEA Harwell Laboratoryin Britain and then at the Watson Research Center. More re-cently he has led groups working on on-line access to the scien-tific literature, digital libraries, information retrieval,naturallan-guage processing, and knowledge management. In addition to histechnical interests,he works on the practical problemsof appliedresearch and technology transfer in an industrial environment.Dr. Marwick currently manages the Knowledge ManagementTechnology Department at the Research Center.

MARWICK IBM SYSTEMS JOURNAL VOL 40 NO 4 2001830

Documents

KM Technology