Overview of Information Retrieval and our Solutions

Qiang Yang

Department of Computer Science and EngineeringThe Hong Kong University of Science and Technology

Hong Kong

Why Need Information Retrieval (IR)?

More and more online information in general (Information Overload)

Many tasks rely on effective management and exploitation of information

Textual information plays an important role in our lives

Effective text management directly improves productivity

What is IR?

Narrow-sense: IR= Search engine technologies

(Google/Yahoo!/Live Search) IR= Text matching/classification

Broad-sense: IR = Text information management:

How to find useful information? (info. retrieval) (e.g., Yahoo!)

How to organize information? (text classification) (e.g., automatically assign email to different folders)

How to discover knowledge from text? (text mining) (e.g., discover correlation of events)

Difficulties Huge Amount of Online Data

Yahoo! has nearly 20 billion pages in its index (as collected at the beginning of 2005)

Different types of data Web-pages, emails, blogs, chatting-room

messages; Ambiguous Queries

Short: 2-4 words Ambiguous: apple; bank…

Our Solutions Query Classification

Champion of KDDCUP’05; TOIS (Vol. 24); SIGIR’06; KDD Exploration (Vol. 7)

Query Expansion/Suggestion Submissions to: SIGIR’07; AAAI’07; KDD’07

Entity Resolution Submission to SIGIR’07

Web page Classification/Clustering SIGIR’04; CIKM’04; ICDM’04; ICDE’06; WWW’06; IPM (2007),

DMKD (Vol. 12) Document Summarization

SIGIR’05; IJCAI’07 Analysis of Blogs, Emails, Chatting-room

messages SIGIR’06; ICDM’06 (2); IJCAI’07

Outline

Query Classification (QC) Introduction Solution 1: Query/category

enrichment; Solution 2: Bridging classifiers;

Entity Resolution Summary of Other works

Query Classification

Introduction Web-Query is difficult to manage:

Short; Ambiguous; Evolving

Query Classification (QC) can help to understand query better

Vertical Search Re-rank search results Online Advertisements

Difficulties of QC (Different from text classification) How to represent queries Target taxonomy is dynamic, e.g. online ads

taxonomy Training data is difficult to collect

Problem Definition

Inspired by the KDDCUP’05 competition Classify a query into a ranked list of

categories Queries are collected from real search

engines Target categories are organized in a tree

with each node being a category

Related Work

Document Classification Feature selection [Yang et al. 1997] Feature generation [Cai et al. 2003] Classification algorithms

Naïve Bayes [Andrew and Nigam 1998] KNN [Yang 1999] SVM [Joachims 1999] ……

An overall survey in [Sebastiani 2002]

Related work Query Classification/Clustering

Classify the Web queries by geographical locality [Gravano 2003];

Classify queries according to their functional types [Kang 2003];

Beitzel et al. studied the topical classification as we do. However they have manually classified data [Beitzel 2005];

Beeferman and Wen worked on query clustering using clickthrough data respectively [Beeferman 2000; Wen 2001];

Overview of Information Retrieval and our Solutions

Documents

Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval

TRECVID 2007 - Overview - Retrieval Group Homepage 2007 - Overview Paul Over {over@nist.gov} and George Awad {gawad@nist.gov} Retrieval Group Information Access Division National Institute

Mechtronic Solutions - Overview

Novocastra™ Epitope Retrieval Solutions

Music Information Retrieval: Overview and Current Trends 2008

1 Overview of Information Retrieval and our Solutions Qiang Yang Department of Computer Science and Engineering The Hong Kong University of Science and

Multimedia Retrieval. Outline Overview Indexing Multimedia Generative Models & MMIR –Probabilistic Retrieval –Language models, GMMs Experiments –Corel

EO-BASED RETRIEVAL OF SNOW COVER, OVERVIEW OF …€¦ · EO-BASED RETRIEVAL OF SNOW COVER, OVERVIEW OF SELECTED SNOW PRODUCTS AND THEIR QUALITY ASSESSMENT Kari Luojus1), Jouni Pulliainen1),

Overview of the CrIMSS (CrIS/ATMS) Retrieval Algorithm and …cimss.ssec.wisc.edu/itwg/itsc/itsc13/session10/10_3_liu.pdf · 2003-11-25 · Overview of the CrIMSS (CrIS/ATMS) Retrieval

AUTOMATED STORAGE AND RETRIEVAL SYSTEMS · AUTOMATED STORAGE AND RETRIEVAL SYSTEMS . Solutions for pallets Solutions for boxes and bulk product. Solutions for pallets Clad-rack warehouses

Overview of the ImageCLEF 2013 Personal Photo Retrieval

CLEF-2005 Cross-Language Speech Retrieval Track Overview

Information retrieval: overview

Overview of Collaborative Information Retrieval (CIR) at FIRE 2012

Solutions Overview

Information Retrieval and Web Search Course overview Instructor: Rada Mihalcea

Retrieval Model Overview Boolean Retrieval Retrieval INFO 4300 / CS 4300 ! Retrieval models – Older models » Boolean retrieval » Vector Space model – Probabilistic Models »

Tegrity Campus Overview Lecture Capture & Retrieval Service

Tegrity Campus Overview Lecture Capture & Retrieval Service

Course Overview: An Introduction to Information Retrieval and Applications