15
Introduction to Information Retrieval

Introduction to Information Retrievalwidit2.knu.ac.kr/~kiyang/teaching/gSE/f18/lectures/1.gSE... · 2015-09-19 · Interesting information is scattered like berries among bushes

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to Information Retrievalwidit2.knu.ac.kr/~kiyang/teaching/gSE/f18/lectures/1.gSE... · 2015-09-19 · Interesting information is scattered like berries among bushes

Introduction to Information Retrieval

Page 2: Introduction to Information Retrievalwidit2.knu.ac.kr/~kiyang/teaching/gSE/f18/lectures/1.gSE... · 2015-09-19 · Interesting information is scattered like berries among bushes

What is IR?

Sit down before fact as a little child,

be prepared to give up every conceived notion, follow humbly wherever and whatever abysses nature leads, or you will learn nothing.

-- Thomas Huxley -- Search Engines 2

Google Query = What is IR? Query = What is information retrieval? Ask.com Query = What is IR? Query = What is information retrieval? Yahoo! Query = What is IR? Query = What is information retrieval?

Google Korea Query = What is IR? Query = What is information retrieval? Naver Query = What is IR? Query = What is information retrieval? Daum Query = What is IR? Query = What is information retrieval?

Page 3: Introduction to Information Retrievalwidit2.knu.ac.kr/~kiyang/teaching/gSE/f18/lectures/1.gSE... · 2015-09-19 · Interesting information is scattered like berries among bushes

IR: Key Questions

What are we looking for? How do we find it? Why is it difficult?

Search Engines 3

“A prudent question is one-half of wisdom” Francis Bacon

Page 4: Introduction to Information Retrievalwidit2.knu.ac.kr/~kiyang/teaching/gSE/f18/lectures/1.gSE... · 2015-09-19 · Interesting information is scattered like berries among bushes

IR: What are we looking for?

We are ► Looking for X.

• Q&A: population of China • Known-item Search: “Cather in the Rye”

► Looking for something like/about X. • General/background info: Taliban • Collection Development: IR Literature • Similar to (known) X: like “Cather in the Rye” • WhatyoumacallX: “the rye-boy story”

► Looking for something • Problem Resoultion: how can we fight terrorism? • Knowledge Development: what is IR?

► Looking • Need something, but don’t know what

what’s it all about? • Serendipity: Web surfing

Search Engines 4

Page 5: Introduction to Information Retrievalwidit2.knu.ac.kr/~kiyang/teaching/gSE/f18/lectures/1.gSE... · 2015-09-19 · Interesting information is scattered like berries among bushes

IR: How do we find it? Brute force search

► Easy to build, maintain, and use ► Searcher does all the work; Hard to get satisfaction

Organize/structure the data ► Intuitive to use ► Hard to build and maintain ► Knowledge of builder’s language & organization structure is crucial

Use a search tool ► Easier to build and maintain: Less manipulation of data ► Sometimes works, sometimes not (Helps to know the language of the data)

Ask the experts ► Easy and satisfying to use (by definition) ► “Expert” knowledge is transitory, hard to encapsulate

Go with the crowd ► Relatively easy to build and maintain

► Limited utility: doesn’t work with “unpopular” X

Zen-Fusion search.

Search Engines 5

Page 6: Introduction to Information Retrievalwidit2.knu.ac.kr/~kiyang/teaching/gSE/f18/lectures/1.gSE... · 2015-09-19 · Interesting information is scattered like berries among bushes

Information Seeking Process: Dynamic, Interactive, Iterative

User Intermediary Information

What am I looking for? - Identification of info. need What question do I ask? - Query formulation

What is the searcher looking for? - Discovery of user’s info. need How should the question be posed? - Query representation Where is the relevant information? - Query-document matching

What data to collect? - Collection development What information to index? - Indexing/Representation How to represent it? - Data structure

Search Engines 6

Page 7: Introduction to Information Retrievalwidit2.knu.ac.kr/~kiyang/teaching/gSE/f18/lectures/1.gSE... · 2015-09-19 · Interesting information is scattered like berries among bushes

Information Seeking Models Berry-picking Model (딸기따기 모델)

► Interesting information is scattered like berries among bushes.

► Information seeking is a dynamic, non-linear process, where information need/queries continually shift.

► Information needs are not satisfied by a single, final retrieved set of documents, but rather by a series of selections and bits of information found along the way.

Traditional Model ► Linear process:

1. Problem identification 2. Identification of information need 3. Query formulation 4. Result evaluation

► Static information need ► The goal is to retrieve a perfect

match of the information need

Search Engines 7

Bates, 1989 Broader, 2002

Page 8: Introduction to Information Retrievalwidit2.knu.ac.kr/~kiyang/teaching/gSE/f18/lectures/1.gSE... · 2015-09-19 · Interesting information is scattered like berries among bushes

IR Research: Overview

Search Engines 8

Information Organization: - Add structure & annotation

Information Retrieval - Create a searchable index

Information Access - Retrieve information

Data Mining - Discover Knowledge

Page 9: Introduction to Information Retrievalwidit2.knu.ac.kr/~kiyang/teaching/gSE/f18/lectures/1.gSE... · 2015-09-19 · Interesting information is scattered like berries among bushes

IR Research: Information Retrieval

Search Engines 9

Representation - indexing, term weighting

Searchable Index Raw Data

Query Formulation - “What is information retrieval?”

Search Results - (ranked) document list

D1 wd1 wd2 wd3

D2 wd2 wd4 wd1 wd2

D3 wd1 wd4

Index Term D1 D2 D3

wd1 (information) 1 1 1

wd2 (model) 0 1 1

wd3 (retrieval) 1 2 0

wd4 (seminar) 1 0 0

Rank docID score

1 D2 3

2 D1 2

3 D3 1

D1: information retrieval seminars D2: retrieval models and information retrieval D3: information model

Page 10: Introduction to Information Retrievalwidit2.knu.ac.kr/~kiyang/teaching/gSE/f18/lectures/1.gSE... · 2015-09-19 · Interesting information is scattered like berries among bushes

IR Research: Information Organization

Search Engines 10

Representation - NLP & Machine Learning

Organized Data Raw Data

Query Formulation - “What is IR?”

Search Results - document groups

Page 11: Introduction to Information Retrievalwidit2.knu.ac.kr/~kiyang/teaching/gSE/f18/lectures/1.gSE... · 2015-09-19 · Interesting information is scattered like berries among bushes

IR Research: Natural Language Processing Goal

► Understanding/effective processing of natural language • Not just pattern matching

Research area, technique, tool for ► Knowledge Discovery, Data Mining

Lexical Analysis using ► Part-of-Speech (POS) tagging ► Sentence Parsing

Search Engines 11

Page 12: Introduction to Information Retrievalwidit2.knu.ac.kr/~kiyang/teaching/gSE/f18/lectures/1.gSE... · 2015-09-19 · Interesting information is scattered like berries among bushes

IR Research: Machine Learning Research Area, technique, tool for

► Information Organization, Knowledge Discovery, Data Mining Information Organization via

► Supervised Learning (Automatic Classification) ► Unsupervised Learning (Clustering)

Search Engines 12

Class 1

Class 2

Class 1

Class 2 Classification

Clustering

Page 13: Introduction to Information Retrievalwidit2.knu.ac.kr/~kiyang/teaching/gSE/f18/lectures/1.gSE... · 2015-09-19 · Interesting information is scattered like berries among bushes

IR Research: Lifecycle 1. Identify a research question 2. Find out what others have done (i.e. Literature Review)

3. Design an experiment i. Form a hypothesis ii. Determine specifications (task, data, system, evaluation, user) iii. Construct a strategy to accomplish task

4. Conduct the experiments i. Design an IR system architecture based on the experiment design ii. Implement the system iii. Tune system modules with training data iv. Execute retrieval runs with test data

5. Write papers i. Analyze results ii. Execute post-experiment runs iii. Analyze the post-experiment results iv. Write a conference paper v. Present the paper at a conference vi. Conduct a follow-up study vii. Analyze the follow-up study results viii. write a Journal paper

Search Engines 13

Page 14: Introduction to Information Retrievalwidit2.knu.ac.kr/~kiyang/teaching/gSE/f18/lectures/1.gSE... · 2015-09-19 · Interesting information is scattered like berries among bushes

What is TextREtrievalConference? Annual Information Retrieval conference

► Sponsored by • National Institute of Standards & Technology (NIST) • Defense Advanced Research Project Agency (DARPA) • Other U.S. agencies (e.g. DOD)

► Attended by • International researchers from academic,

commercial, and government institutions

Goals ► Advance IR research based on large-scale data ► Refine IR evaluation methodologies ► Create test collections for various aspects of IR ► Stimulate exchange of ideas & communication among academia, industry, and

government

Search Engines 14

Voorhees, 2014

Page 15: Introduction to Information Retrievalwidit2.knu.ac.kr/~kiyang/teaching/gSE/f18/lectures/1.gSE... · 2015-09-19 · Interesting information is scattered like berries among bushes

TREC Tasks: Tracks

Search Engines 15

Voorhees, 2014