27
Information Retrieval Techniques Israr Hanif M.Phil QAU Islamabad Ph D (In progress) COMSATS

Information Retrieval Techniques Israr Hanif M.Phil QAU Islamabad Ph D (In progress) COMSATS

Embed Size (px)

Citation preview

Page 1: Information Retrieval Techniques Israr Hanif M.Phil QAU Islamabad Ph D (In progress) COMSATS

Information Retrieval Techniques

Israr HanifM.Phil QAU Islamabad

Ph D (In progress) COMSATS

Page 2: Information Retrieval Techniques Israr Hanif M.Phil QAU Islamabad Ph D (In progress) COMSATS

Information Retrieval Techniques

MS(CS) Lecture 1AIR UNIVERSITY MULTAN CAMPUS

Page 3: Information Retrieval Techniques Israr Hanif M.Phil QAU Islamabad Ph D (In progress) COMSATS

Information Retrieval Systems

• Information– What is “information”?

• Retrieval– What do we mean by “retrieval”?– What are different types information needs?

• Systems– How do computer systems fit into the human

information seeking process?

Page 4: Information Retrieval Techniques Israr Hanif M.Phil QAU Islamabad Ph D (In progress) COMSATS

Dictionary says…

• Oxford English Dictionary– information: informing, telling; thing told, knowledge,

items of knowledge, news– knowledge: knowing familiarity gained by experience;

person’s range of information; a theoretical or practical understanding of; the sum of what is known

• Random House Dictionary– information: knowledge communicated or received

concerning a particular fact or circumstance; news

Page 5: Information Retrieval Techniques Israr Hanif M.Phil QAU Islamabad Ph D (In progress) COMSATS

Intuitive Notions

• Information must– Be something, although the exact nature

(substance, energy, or abstract concept) is not clear;

– Be “new”: repetition of previously received messages is not informative

– Be “true”: false or counterfactual information is “mis-information”

– Be “about” something

Robert M. Losee. (1997) A Discipline Independent Definition of Information. Journal of the American Society for Information Science, 48(3), 254-269.

Page 6: Information Retrieval Techniques Israr Hanif M.Phil QAU Islamabad Ph D (In progress) COMSATS

Information Hierarchy

Data

Information

Knowledge

Wisdom

More refined and abstract

Page 7: Information Retrieval Techniques Israr Hanif M.Phil QAU Islamabad Ph D (In progress) COMSATS

Information Hierarchy

• Data– The raw material of information

• Information– Data organized and presented in a particular manner

• Knowledge– “Justified true belief”– Information that can be acted upon

• Wisdom– Distilled and integrated knowledge– Demonstrative of high-level “understanding”

Page 8: Information Retrieval Techniques Israr Hanif M.Phil QAU Islamabad Ph D (In progress) COMSATS

A (Facetious) Example

• Data– 98.6º F, 99.5º F, 100.3º F, 101º F, …

• Information– Hourly body temperature: 98.6º F, 99.5º F, 100.3º F,

101º F, …• Knowledge– If you have a temperature above 100º F, you most likely

have a fever• Wisdom– If you don’t feel well, go see a doctor

Page 9: Information Retrieval Techniques Israr Hanif M.Phil QAU Islamabad Ph D (In progress) COMSATS

What types of information?

• Text (Documents and portions thereof)• XML and structured documents• Images• Audio (sound effects, songs, etc.) • Video• Source code• Applications/Web services

Page 10: Information Retrieval Techniques Israr Hanif M.Phil QAU Islamabad Ph D (In progress) COMSATS

“Retrieval?”

• “Fetch something” that’s been stored• Recover a stored state of knowledge• Search through stored messages to find some

messages relevant to the task at hand

Sender Recipient

Encoding Decodingstoragemessage message

noiseindexing/writing Retrieval/reading

Page 11: Information Retrieval Techniques Israr Hanif M.Phil QAU Islamabad Ph D (In progress) COMSATS

What is IR?

• Information retrieval is a problem-oriented discipline, concerned with the problem of the effective and efficient transfer of desired information between human generator and human user

• Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers).

Anomalous States of Knowledge as a Basis for Information Retrieval. (1980) Nicholas J. Belkin. Canadian Journal of Information Science, 5, 133-143.

Page 12: Information Retrieval Techniques Israr Hanif M.Phil QAU Islamabad Ph D (In progress) COMSATS

What is Information Retrieval ?

• The process of actively seeking out information relevant to a topic of interest (van Rijsbergen)

– Typically it refers to the automatic (rather than manual) retrieval of documents• Information Retrieval System (IRS)

– “Document” is the generic term for an information holder (book, chapter, article, webpage, etc)

Page 13: Information Retrieval Techniques Israr Hanif M.Phil QAU Islamabad Ph D (In progress) COMSATS

Hopkins IR Workshop 2005 Copyright © Victor Lavrenko

What is Information Retrieval?

• Most people equate IR with web-search– highly visible, commercially successful endeavors– leverage 3+ decades of academic research

• IR: finding any kind of relevant information– web-pages, news events, answers, images, …– “relevance” is a key notion (details in Part II)

Page 14: Information Retrieval Techniques Israr Hanif M.Phil QAU Islamabad Ph D (In progress) COMSATS

1414

Page 15: Information Retrieval Techniques Israr Hanif M.Phil QAU Islamabad Ph D (In progress) COMSATS

1515

Page 16: Information Retrieval Techniques Israr Hanif M.Phil QAU Islamabad Ph D (In progress) COMSATS

The formalized IR process

Collection of documents

Real world

Document representations Query

Information need

Anomalous state of knowledge

Matching

Results

Page 17: Information Retrieval Techniques Israr Hanif M.Phil QAU Islamabad Ph D (In progress) COMSATS

What do we want from an IRS ?

• Systemic approach– Goal (for a known information need):

Return as many relevant documents as possible and as few non-relevant documents as possible

• Cognitive approach– Goal (in an interactive information-seeking

environment, with a given IRS):Support the user’s exploration of the problem domain and the task completion.

Page 18: Information Retrieval Techniques Israr Hanif M.Phil QAU Islamabad Ph D (In progress) COMSATS

The role of an IR system – a modern view –

• Support the user in– exploring a problem domain, understanding its

terminology, concepts and structure– clarifying, refining and formulating an information

need– finding documents that match the info need

description• As many relevant docs as possible• As few non-relevant documents as possible

Page 19: Information Retrieval Techniques Israr Hanif M.Phil QAU Islamabad Ph D (In progress) COMSATS

How does it do this ?

• User interfaces and visualization tools for– exploring a collection of documents– exploring search results

• Query expansion based on– Thesauri– Lexical/statistic analysis of text / context and concept

formation– Relevance feedback

• Indexing and matching model

Page 20: Information Retrieval Techniques Israr Hanif M.Phil QAU Islamabad Ph D (In progress) COMSATS

How well does it do this ?• Evaluation– Of the components• Indexing / matching algorithms

– Of the exploratory process overall• Usability issues• Usefulness to task• User satisfaction

Page 21: Information Retrieval Techniques Israr Hanif M.Phil QAU Islamabad Ph D (In progress) COMSATS

Role of the user interface in IR

Problem definition

Source selection

Problem articulation

Examination of results

Extraction of information

Integration with overall task

INPUT

OUTPUT

Engine

Page 22: Information Retrieval Techniques Israr Hanif M.Phil QAU Islamabad Ph D (In progress) COMSATS

The Big Picture

• The four components of the information retrieval environment:– User– Process– System– Collection

What computer geeks care about!What we care about!

Page 23: Information Retrieval Techniques Israr Hanif M.Phil QAU Islamabad Ph D (In progress) COMSATS

The Information Retrieval CycleSource

Selection

Search

Query

Selection

Ranked List

Examination

Documents

Delivery

Documents

QueryFormulation

Resource

query reformulation,vocabulary learning,relevance feedback

source reselection

Page 24: Information Retrieval Techniques Israr Hanif M.Phil QAU Islamabad Ph D (In progress) COMSATS

Supporting the Search ProcessSource

Selection

Search

Query

Selection

Ranked List

Examination

Documents

Delivery

Documents

QueryFormulation

Resource

Indexing Index

Acquisition Collection

Page 25: Information Retrieval Techniques Israr Hanif M.Phil QAU Islamabad Ph D (In progress) COMSATS

Simplification?Source

Selection

Search

Query

Selection

Ranked List

Examination

Documents

Delivery

Documents

QueryFormulation

Resource

query reformulation,vocabulary learning,relevance feedback

source reselection

Is this itself a vast simplification?

Page 26: Information Retrieval Techniques Israr Hanif M.Phil QAU Islamabad Ph D (In progress) COMSATS

The IR Black BoxDocumentsQuery

Hits

Page 27: Information Retrieval Techniques Israr Hanif M.Phil QAU Islamabad Ph D (In progress) COMSATS

Inside The IR Black BoxDocumentsQuery

Hits

RepresentationFunction

RepresentationFunction

Query Representation Document Representation

ComparisonFunction Index