48
Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

Embed Size (px)

Citation preview

Page 1: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

Course Overview: An Introduction to Information

Retrieval and Applications

J. H. WangFeb. 17, 2014

Page 2: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

IR, Spring 2014 NTUT CSIE 2

Instructor & TA

• Instructor– J. H. Wang (王正豪 )– Associate Professor, CSIE, NTUT– Office: R1534, Technology Building– E-mail: [email protected]– Tel: ext. 4238– Office Hour: 9:00-12:00 am, every Tuesday and

Thursday• TA

– Mr. Huang (R1424, Technology Building)• Available Time: Mon. morning or Tue. Afternoon• E-mail: jsn900211 @ gmail.com

Page 3: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

IR, Spring 2014 NTUT CSIE 3

Course Description• Course Web Page: for the latest announcements and

updates of schedule, slides, and homeworks– http://www.ntut.edu.tw/~jhwang/IR/

• Time: 9:10-12:00am, Fri.• Classroom: R334, Technology Building• Textbook:

– Christopher D. Manning, Prabhakar Raghavan and Hinrich Schuetze, Introduction to Information Retrieval, Cambridge University Press, 2008.

• Available online• International Student Edition, imported by Kai-Fa (開發 )

Publishing

• Prerequisites: – Basic knowledge of data structures and algorithms, linear

algebra, and probability theory – Programming experience is *required* for homeworks &

projects

Page 4: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

Target Audience

• Seniors• Graduate students• IGPEECS (International Graduate

Program in Electrical Engineering and Computer Science)

IR, Spring 2014 NTUT CSIE 4

Page 5: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

IR, Spring 2014 NTUT CSIE 5

Additional References

• References: – Ricardo Baeza-Yates and Berthier Ribeiro-Neto,

Modern Information Retrieval: The Concepts and Technology behind Search, Addison-Wesley, 2011.

• This is the second edition of their book Modern Information Retrieval in 1999. (華通 )

– Bruce Croft, Donald Metzler, and Trevor Strohman, Search Engines: Information Retrieval in Practice, Addison-Wesley, 2010. (全華 )

– Stefan Buettcher, Charles L.A. Clarke, and Gordon V. Cormack, Information Retrieval: Implementing and Evaluating Search Engines, MIT Press, 2010.

Page 6: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

IR, Spring 2014 NTUT CSIE 6

More Books on IR• Gerald Salton, Automatic information organization

and retrieval, McGraw-Hill, 1968.• Gerald Salton and M.J. McGill, Introduction to

modern information retrieval, McGraw-Hill, 1983.– Two classics, but out-of-print.

• C. J. van Rijsbergen, Information Retrieval, Butterworths, 1979. – The classic. More than 40 years old, but still worth

reading. • K. Sparck Jones, P. Willett, Readings in Information

Retrieval, Morgan Kaufmann, 1997. – A collection of classical IR papers. (out of print)

• I.H. Witten, A. Moffat, T.C. Bell. Morgan Kaufmann, Managing Gigabytes, 2nd edition, 1999. – The authority on index construction and compression.

Page 7: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

IR, Spring 2014 NTUT CSIE 7

Grading Policy

• Homework assignments and programming exercises: ~40%

• Mid-term exam: ~25%• Term project: ~35%

– Including proposal, presentation, and final report

Page 8: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

IR, Spring 2014 NTUT CSIE 8

Programming Exercises and Term Project

• About 3 programming exercises– Team-based with maximum number of students per

team:• 4 for undergraduates• 2 for graduate students

– You can either write your own code or reuse existing open source code

• The term project– Either team-based system development (the same as

programming exercises)– Or academic paper presentation

• Only one person per team allowed

– A proposal is *required* before midterm (Apr. 11, 2014)

Page 9: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

IR, Spring 2014 NTUT CSIE 9

About the Term Project

• The score you get depends on the functions, difficulty and quality of your project – For system development:

• System functions and correctness

– For academic paper presentation• Quality and your presentation of the paper• Major methods/experimental results *must* be presented• Papers from top conferences are strongly suggested

– E.g. SIGIR, WWW, CIKM, WSDM, JCDL, ICMR, …

• Proposals are *required* for each team, and will be counted in the score

Page 10: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

IR, Spring 2014 NTUT CSIE 10

Online Submission

• Submission instructions– Programs, project proposals, and project

reports in electronic files must be submitted to the TA online at:• Submissions website:

http://140.124.183.31/net2ftp• Submission instructions:

– FTP server: localhost– User name & password: Your student ID

Page 11: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

IR, Spring 2014 NTUT CSIE 11

What this Course is NOT about

• This course will NOT tell you– The tips and tricks of using search engines,

although power users might have better ideas on how to improve them

• There’re plenty of books and websites on that…

– How to find books in libraries, although it’s somewhat related to the basic IR concepts

– How to make money on the Web, although the currently largest search engine did it

Page 12: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

What’s Information Retrieval?

• Things that you have been doing all day!– Searching for something interesting: Web,

news, e-mail, image, video, …– Asking for advices– …

• User interests are changing all the time…– 2011: New Zealand Earthquake– 2012: Jeremy Lin– 2013: Meteor Russia– 2014: ? (next slide)

IR, Spring 2014 NTUT CSIE 12

Page 13: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

IR, Spring 2014 NTUT CSIE 13

What’s Information Retrieval

Page 14: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

In Google News

IR, Spring 2014 NTUT CSIE 14

Page 15: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

IR, Spring 2014 NTUT CSIE 15

Page 16: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

In Web Pages

IR, Spring 2014 NTUT CSIE 16

Page 17: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

IR, Spring 2014 NTUT CSIE 17

In Wikipedia

Page 18: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

In Google Images

IR, Spring 2014 NTUT CSIE 18

Page 19: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

Different keywords: Ukraine riots

IR, Spring 2014 NTUT CSIE 19

Page 20: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

IR, Spring 2014 NTUT CSIE 20

Page 21: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

More related keywords

IR, Spring 2014 NTUT CSIE 21

Page 22: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

IR, Spring 2014 NTUT CSIE 22

Page 23: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

What if We Search in Chinese

IR, Spring 2014 NTUT CSIE 23

Page 24: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

IR, Spring 2014 NTUT CSIE 24

Page 25: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

IR, Spring 2014 NTUT CSIE 25

Page 26: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

IR, Spring 2014 NTUT CSIE 26

Page 27: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

IR, Spring 2014 NTUT CSIE 27

Page 28: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

IR, Spring 2014 NTUT CSIE 28

Related Keywords

• Ukraine • Ukraine riots• Ukraine crisis• Kiev• Protest• Truce • 2014 Hrushevskoho Street riots• …

Page 29: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

IR, Spring 2014 NTUT CSIE 29

Related Keywords in Chinese

• 烏克蘭• 基輔• 示威• 衝突• 危機• 鎮壓• …• And this can go on:

– for other languages…– and other search engines…– and social websites…

Page 30: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

IR, Spring 2014 NTUT CSIE 30

In Google Trends

Page 31: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

IR, Spring 2014 NTUT CSIE 31

Page 32: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

IR, Spring 2014 NTUT CSIE 32

Page 33: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

IR, Spring 2014 NTUT CSIE 33

And Social Search…

Page 34: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

How do I Know What People Care about?

IR, Spring 2014 NTUT CSIE 34

Page 35: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

What are People Searching in Taiwan on that day?

IR, Spring 2014 NTUT CSIE 35

Page 36: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

IR, Spring 2014 NTUT CSIE 36

What Is Information Retrieval?

• “Information retrieval is a field concerned with the structure, analysis, organization, storage, searching, and retrieval of information.” (Salton, 1968)

Page 37: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

IR, Spring 2014 NTUT CSIE 37

Goal

• Information retrieval (IR): a research field that targets at effectively and efficiently searching information in text and multimedia documents

• In this course, we will introduce the basic text and query models in IR, retrieval evaluation, indexing and searching, and applications for IR

Page 38: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

IR, Spring 2014 NTUT CSIE 38

A Big Picture

Page 39: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

IR, Spring 2014 NTUT CSIE 39

Inverted Index

UserInterface

Text Operations

Query Expansion IndexingIndexing

RetrievalRetrieval

RankingRanking

Text

query

user need

user feedback

ranked docs

retrieved docs

Doc representationlogical view

inverted file

Document Collection

Page 40: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

IR, Spring 2014 NTUT CSIE 40

Topics

• Text IR– Indexing and searching– Query languages and operations

• Retrieval evaluation• Modeling

– Boolean model– Vector space model– Probabilistic model

• Applications for IR– Multimedia IR– Web search– Digital libraries

Page 41: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

IR, Spring 2014 NTUT CSIE 41

Organization of the Textbook

• Basics in IR (focus)– Inverted indexes for boolean queries (Ch.1-5)– Term weighting and vector space model (Ch. 6-7)– Evaluation in IR (Ch. 8)

• Advanced Topics– Relevance feedback (Ch. 9)– XML retrieval (Ch. 10)– Probabilistic IR (Ch. 11)– Language models (Ch. 12)

• Machine learning in IR (useful)– Text classification (Ch. 13-15)– Document clustering (Ch. 16-18)

• Web Search– Web crawling and indexes (Ch. 19-20)– Link analysis (Ch. 21)

Page 42: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

Some Overlap with Other Fields

• Text mining, Information Extraction• Machine Learning• Natural Language Processing• Social Network Analysis• …

IR, Spring 2014 NTUT CSIE 42

Page 43: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

IR, Spring 2014 NTUT CSIE 43

Pointers to Other Topics

• Cross-language IR• Image, video, and multimedia IR• Speech retrieval• Music retrieval• User interfaces• Parallel, distributed, and P2P IR• Digital libraries• Information science perspective• Logic-based approaches to IR• Natural language processing techniques• …

Page 44: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

IR, Spring 2014 NTUT CSIE 44

Tentative Schedule

• Before midterm– Boolean retrieval (1 wk)– Indexing (2 wks)– Vector space model and evaluation (2 wk)– Relevance feedback (1 wk)– Probabilistic IR (2 wk)

• After midterm – Text classification (1-2 wk)– Document clustering (1-2 wk)– Web search (2 wks)– Advanced topics: CLIR, IE, … (2 wks)– Term Project Presentation (3 wks)

Page 45: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

IR, Spring 2014 NTUT CSIE 45

Generic Resources

• Wikipedia page on Information Retrieval: http://en.wikipedia.org/wiki/Information_retrieval

• Information Retrieval Resources: http://www-csli.stanford.edu/~hinrich/information-retrieval.html

Page 46: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

IR, Spring 2014 NTUT CSIE 46

Academic Resources• Journals

– ACM TOIS: Transactions on Information Systems – JASIST: Journal of the American Society of Information

Sciences– IP&M: Information Processing and Management– IEEE TKDE: Transactions on Knowledge and Data Engineering

• Conferences– ACM SIGIR: International Conference on Information Retrieval– WWW: World Wide Web Conference– ACM CIKM: Conference on Information Knowledge and

Management– JCDL: ACM/IEEE Joint Conference on Digital Libraries– ACM WSDM: International Conference on Web Search and

Data Mining– TREC: Text Retrieval Conference

Page 47: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

Teaching in English…

• Slides and lectures will be offered mainly in English

• For better understanding for domestic students, important concepts will be briefly summarized in Chinese

IR, Spring 2014 NTUT CSIE 47

Page 48: Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014

IR, Spring 2014 NTUT CSIE 48

Thanks for Your Attention!

• Any question or comment? Please feel free to send e-mails to [email protected] or discuss with me at my office