Upload
mglindquist
View
229
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Online Information Meeting in London Dec. 2007. "Acceptance speech" for the Strix awward.
Citation preview
www.kb.se
Information retrieval from KWIC to Enterprise
SearchBy
Mats G. Lindquist
National Library of Sweden
www.kb.se
IR – the Technology Dimension
Information carriers:
• Punch cards (Hollerith cards)
• Magnetic tapes
• Direct Access storage (disc a.o.)
• CD-ROM (distribution copies of db)
www.kb.se
KWIC – an example of early technology
Needed: sort program and line printer
www.kb.se
Magnetic tape
Task: Arrange Q’s in core memory to give
maximum number of A’s per tape run.
www.kb.se
IR – the Content Dimension
• 60’s Numbers, alphanumeric codes
• 70’s plus simple (short) texts
• 80’s plus long texts, extended character sets, office documents
• 90’s plus images, graphs
• 00’s plus moving images
www.kb.se
Wherever there is information piling up
there is a call for information retrieval.
The nature of the information determines
the approaches and methods for IR.
www.kb.se
IR – the Application DimensionA develpment from structured scientificand technical (mostly) bibliographicalinformation to information from more generalinformation sources.
Information of greater variety in form andless control (of structure).
www.kb.se
Freetext rules …
… OA?
Word processing and computing (ADP) came together mid 80’sOffice Automation was hot
www.kb.se
Enter: Document Management
IR thinking and methods became accepted
as a useful approach to the management
of information from a corporate perspective
- DMS – document management systems
- IR features for searching data in general
www.kb.se
Functionality maturesEssential features:• Total content searchable (no stoplist)• Complete pattern searching (# * ! : ?)• Vocabulary look-up/search• Search history re-useable• Relevance ranking• Hits in context• Incremental updating• RDB features (incl. some math)
www.kb.se
and then came …
T… the internet
www.kb.se
IR – the Future Dimension
In the beginning web searching was done
without using the experiences and
achievements of IR.
But information was piling up – clearly new
IR techniques were needed.
www.kb.se
New rules OK?• Internet (intranet) is the information carrier• Applications challenge boundaries
– Private vs. Public– Individual vs. Institutional
• In the enterprise– The variety of form increases– The volume increases– Control of structure decreases– Control of sources decreases
www.kb.se
A new technology landscape
• Volumes and volumes of storage
• Plenty of computing power
• Huge amounts of bandwidth
www.kb.se
The end of precision retrieval?
Precision retrieval is replaced by a new kind of
retireval, the Quick Search which sets user
ambitions and requirements.
Search engines are different from IR-systems.
IR must rise to the challenges of Enterprise
Search – in-house search engines.
www.kb.se
What is ahead?• The main stage for development must be the user
interface for searching and presentation.• Make searching even more simple.
Consider this headline from the Technology Guardian,
May 3, 2007:
It’s time for Amazon to turn a new leaf and make
searching for books at its site a whole lot easier.
(article by Wendy M Grossman)
www.kb.se
The big challenge
• The big challenge is to achieve enough structure to enable precision search without loosing to much recall.
• And without manual intervention – it must be automatic.
www.kb.se
References
Ashford, John and Willett, Peter, Text retrieval and document databases,Bromley :Chartwell-Bratt, 1989 (ISBN: 0-86238-204-1)
Brooks, Terrence A. (2004), ”The Nature of Meaning in the Age of Google”,Information Research, 9(3) paper 180 http://InformationR.net/ir/9-3/paper180.html
Grossman, Wendy M., It’s time for Amazon to turn over a new leaf ---,The Technology Guardian, May 3, 2007http://www.guardian.co.uk/technology/2007/may/03/comment.guardianweeklytechnologysection1
Lindquist, Mats G., "3RIP-COM: Integrating Information Retrieval andComputerized Conferencing", Proc. Am. Soc. Info. Science, vol. 17 (1980),pp. 71-73.
www.kb.se
Lindquist, Mats G., "Information Resources Management (IRM),Yesterday,
Today and Tomorrow", NORD IoD 6, Samfundet för Informationstjänst i
Finland, Helsinki, 1985, pp. 275-280.
O’Neill, Edward T., Lavoie, Brian F., Bennet, Rick (2003), ”Trends in the
Evolution of the Public Web 1998-2002”, D-Lib Magazine 9(4),
http://www.dlib.org/dlib/april03/lavoie/04lavoie.html
Schneider, Karen G., How OPACs Suck, Part 2: The Checklist of Shame,
Posted 04/03/2006 [ http://www.techsource.ala.org/blog/2006/04/how-opacs-suck-part-2-the-checklist-of-shame.html ]
www.kb.se