21
LIS510 lecture 3 Thomas Krichel 2005-02-05

LIS510 lecture 3 Thomas Krichel 2005-02-05. information storage & retrieval this area is now more know as information retrieval when I dealt with it I

Embed Size (px)

Citation preview

LIS510 lecture 3

Thomas Krichel

2005-02-05

information storage & retrieval

• this area is now more know as information retrieval

• when I dealt with it I meant storage as including the organization of the information, which is a bit of a stretch

• Ideally, one needs to know the retrieval needs before designing the organization of the information

information retrieval

• has to do with anything of how the user gets to the information out of an information system.

• it is different from data retrieval since the retrieved data has to be “relevant” to the user.

• it is very difficult to say what “relevance” is, objectively.

information retrieval performance

• how was it for you?

• the traditional methods are – precision = number of relevant documents

retrieved divided by total number of retrieved documents

– recall = number of relevant documents retrieved divided by total number of relevant document.

• they only evaluate a search!

information retrieval models

• they give formal account of the search process.

• there are three basic flavor– Boolean information retrieval– Vector information retrieval– Probabilistic information retrieval

• All are mathematical model• I would also add web information retrieval

as a new type

web information retrieval

• this has become big business now

• find a user’s need is a way to connect them with advertising.

• One way that has made Google such a success is that they discovered a way to make appear quality web sites to the top

• Basically, a quality web site is one that has many links to it from other quality sites.

information storage

• can mean the preparation of information before searching– which fields are searchable– can there be a variety of means to rank

searches?– is there use of a controlled vocabulary

• difficult to make general conclusions but to say that advanced search features are not much used.

human-computer interface

• tries to understand how users work with computer systems

• the idea is to build “user-friendly” systems

• but don’t leave that to a “computer designer” as suggested by Rubin

• note that information systems go way beyond computers.

• Web usability is a big topic.

natural language processing

• Rubin classifies this as a part of computer-human interface

• natural language processing is still in its infancy

• speech recognition is the best developed part

• others are working on connecting computers to the brain

artificial intelligence

• This has been around for a while.

• The field has developed a number of theoretical tools

• Some of them are being used in practice now. Things like RDF, the Resource Description Framework, are based on artificial intelligence theory. It is a tool to aggregate knowledge from web resource.

Area 3: defining information & its value

• There is debate on the nature of– data (Thomas: things that can be processed in

the information system)– knowledge (Thomas: stuff that is in people’s

head)– information (something between data and

knowledge). Rubin says its meaning given to data.

• Rubin also talks about wisdom as “knowledge applied for the benefit of humanity”

scientific view of information

• usually information is modeled as something that reduces uncertainty

• people have a rough idea about something, say tomorrow’s temperature

• the information is the fact that this something will actually take a precise value, when we know what the temperature is or when we have less uncertainty.

• usually this uses probability theory.

value of information

• economists can value information precisely but their definition is useless for practical purposes

• much of the work then involves some cost/benefit analysis. in such analysis one can reach almost any result one wants.

elements of value-added in libraries• access to resources

• accuracy (for example of bibliographic data)

• browsing (like in library stacks)

• currency (things are up-to-date)

• flexibility (through human interaction)

• formatting (laying out the collection, signs)

• interfacing (probably close to flexibility)

• ordering (buy access to things)

• access to means to get to resources

area 4: bibliometrics

• is the application of quantitative methods to the study of information resources

• Mainly concerned with the structure of the resources. The typical example is citation analysis.

• Quantitative Studies of use fall more to the first area of interest.

bibliometric laws

• Zipf’s law related to the usage of terms in text.

• Lotka’s law related to the number of papers written by authors.

• Bradford’s law relates to the distribution of articles in a field across a number of periodicals.

citation analysis

• is the heart of bibliometrics.

• Two important concept– bibliographic coupling means two documents

share some reference– co-citation means two documents are cited by

the same documents

• Citation analysis is also important for scientific activity evaluation

area 5: management & admin

• This is an expanding area in libraries.• Rather than collecting physical books,

libraries have to negotiate on-line access. • Area covers all of information policy.

Example problems are– copyright– censorship

• Measuring performance is part of user studies

area 6: information architecture

• art and science of organizing information and its interfaces so that seekers find what they want quickly

• mainly used with respect to large web sites. it looks at the contents rather than technical factors or the look-and-feel

• A related idea is usability

area 7: knowledge management

• this comes from the business environment

• it is a management fad that has overstayed its welcome.

http://openlib.org/home/krichel

Thank you for your attention!