20
www.kb.se Information retrieval from KWIC to Enterprise Search By Mats G. Lindquist National Library of Sweden

From KWIC to Enterprise Search - M G Lindquist

Embed Size (px)

DESCRIPTION

Online Information Meeting in London Dec. 2007. "Acceptance speech" for the Strix awward.

Citation preview

Page 1: From KWIC to Enterprise Search - M G Lindquist

www.kb.se

Information retrieval from KWIC to Enterprise

SearchBy

Mats G. Lindquist

National Library of Sweden

Page 2: From KWIC to Enterprise Search - M G Lindquist

www.kb.se

IR – the Technology Dimension

Information carriers:

• Punch cards (Hollerith cards)

• Magnetic tapes

• Direct Access storage (disc a.o.)

• CD-ROM (distribution copies of db)

Page 3: From KWIC to Enterprise Search - M G Lindquist

www.kb.se

KWIC – an example of early technology

Needed: sort program and line printer

Page 4: From KWIC to Enterprise Search - M G Lindquist

www.kb.se

Magnetic tape

Task: Arrange Q’s in core memory to give

maximum number of A’s per tape run.

Page 5: From KWIC to Enterprise Search - M G Lindquist

www.kb.se

IR – the Content Dimension

• 60’s Numbers, alphanumeric codes

• 70’s plus simple (short) texts

• 80’s plus long texts, extended character sets, office documents

• 90’s plus images, graphs

• 00’s plus moving images

Page 6: From KWIC to Enterprise Search - M G Lindquist

www.kb.se

Wherever there is information piling up

there is a call for information retrieval.

The nature of the information determines

the approaches and methods for IR.

Page 7: From KWIC to Enterprise Search - M G Lindquist

www.kb.se

IR – the Application DimensionA develpment from structured scientificand technical (mostly) bibliographicalinformation to information from more generalinformation sources.

Information of greater variety in form andless control (of structure).

Page 8: From KWIC to Enterprise Search - M G Lindquist

www.kb.se

Freetext rules …

… OA?

Word processing and computing (ADP) came together mid 80’sOffice Automation was hot

Page 9: From KWIC to Enterprise Search - M G Lindquist

www.kb.se

Enter: Document Management

IR thinking and methods became accepted

as a useful approach to the management

of information from a corporate perspective

- DMS – document management systems

- IR features for searching data in general

Page 10: From KWIC to Enterprise Search - M G Lindquist

www.kb.se

Functionality maturesEssential features:• Total content searchable (no stoplist)• Complete pattern searching (# * ! : ?)• Vocabulary look-up/search• Search history re-useable• Relevance ranking• Hits in context• Incremental updating• RDB features (incl. some math)

Page 11: From KWIC to Enterprise Search - M G Lindquist

www.kb.se

and then came …

T… the internet

Page 12: From KWIC to Enterprise Search - M G Lindquist

www.kb.se

IR – the Future Dimension

In the beginning web searching was done

without using the experiences and

achievements of IR.

But information was piling up – clearly new

IR techniques were needed.

Page 13: From KWIC to Enterprise Search - M G Lindquist

www.kb.se

New rules OK?• Internet (intranet) is the information carrier• Applications challenge boundaries

– Private vs. Public– Individual vs. Institutional

• In the enterprise– The variety of form increases– The volume increases– Control of structure decreases– Control of sources decreases

Page 14: From KWIC to Enterprise Search - M G Lindquist

www.kb.se

A new technology landscape

• Volumes and volumes of storage

• Plenty of computing power

• Huge amounts of bandwidth

Page 15: From KWIC to Enterprise Search - M G Lindquist

www.kb.se

The end of precision retrieval?

Precision retrieval is replaced by a new kind of

retireval, the Quick Search which sets user

ambitions and requirements.

Search engines are different from IR-systems.

IR must rise to the challenges of Enterprise

Search – in-house search engines.

Page 16: From KWIC to Enterprise Search - M G Lindquist

www.kb.se

What is ahead?• The main stage for development must be the user

interface for searching and presentation.• Make searching even more simple.

Consider this headline from the Technology Guardian,

May 3, 2007:

It’s time for Amazon to turn a new leaf and make

searching for books at its site a whole lot easier.

(article by Wendy M Grossman)

Page 17: From KWIC to Enterprise Search - M G Lindquist

www.kb.se

The big challenge

• The big challenge is to achieve enough structure to enable precision search without loosing to much recall.

• And without manual intervention – it must be automatic.

Page 18: From KWIC to Enterprise Search - M G Lindquist

www.kb.se

References

Ashford, John and Willett, Peter, Text retrieval and document databases,Bromley :Chartwell-Bratt, 1989 (ISBN:  0-86238-204-1)

Brooks, Terrence A. (2004), ”The Nature of Meaning in the Age of Google”,Information Research, 9(3) paper 180 http://InformationR.net/ir/9-3/paper180.html

Grossman, Wendy M., It’s time for Amazon to turn over a new leaf ---,The Technology Guardian, May 3, 2007http://www.guardian.co.uk/technology/2007/may/03/comment.guardianweeklytechnologysection1

Lindquist, Mats G., "3RIP-COM: Integrating Information Retrieval andComputerized Conferencing", Proc. Am. Soc. Info. Science, vol. 17 (1980),pp. 71-73.

Page 19: From KWIC to Enterprise Search - M G Lindquist

www.kb.se

Lindquist, Mats G., "Information Resources Management (IRM),Yesterday,

Today and Tomorrow", NORD IoD 6, Samfundet för Informationstjänst i

Finland, Helsinki, 1985, pp. 275-280.

O’Neill, Edward T., Lavoie, Brian F., Bennet, Rick (2003), ”Trends in the

Evolution of the Public Web 1998-2002”, D-Lib Magazine 9(4),

http://www.dlib.org/dlib/april03/lavoie/04lavoie.html

Schneider, Karen G., How OPACs Suck, Part 2: The Checklist of Shame,

Posted 04/03/2006 [ http://www.techsource.ala.org/blog/2006/04/how-opacs-suck-part-2-the-checklist-of-shame.html ]

Page 20: From KWIC to Enterprise Search - M G Lindquist

www.kb.se